Yes, it's possible, Check this solution: http://stackoverflow.com/questions/11353911/extending-hadoops-tableinputformat-to-scan-with-a-prefix-used-for-distribution
On Mon, Jan 28, 2013 at 2:07 PM, Oleg Ruchovets <[email protected]>wrote: > Yes. > This is very interesting approach. > > Is it possible to read from main key and scan from another using > map/reduce? I don't want to read from single client. I use hbase version > 0.94.2.21. > > Thanks > Oleg. > > > On Mon, Jan 28, 2013 at 6:27 PM, Rodrigo Ribeiro < > [email protected]> wrote: > > > In the approach that i mentioned, you would need a table to retrieve the > > time of a certain event(if this information can retrieve in another way, > > you may ignore this table). It would be like you posted: > > event_id | time > > ============= > > event1 | 10:07 > > event2 | 10:10 > > event3 | 10:12 > > event4 | 10:20 > > > > And a secundary table would be like: > > rowkey > > =========== > > 10:07:event1 > > 10:10:event2 > > 10:12:event3 > > 10:20:event4 > > > > That way, for your first example, you first retrieve the time of the > > "event1" on the main table, and then scan starting from his position on > the > > secondary table("10:07:event1"), until the end of the window. > > In this case(T=7) the scan will range ["10:07:event1", "10:05"). > > > > As Michel Segel mentioned, there is a hotspot problem on insertion using > > this approach alone. > > Using multiples buckets(could be a hash from the eventId) would > distribute > > it better, but requires to scan on all buckets from the second table to > get > > all events of the window of time. > > > > Assuming you use 3 buckets, it would look like: > > rowkey > > =========== > > *1_*10:07:event1 > > *2_*10:10:event2 > > *3_*10:12:event3 > > *2_*10:20:event4 > > > > The scans would be: ["*1*_10:07:event1", "1_10:15"), ["*2*_10:07:event1", > > "2_10:15"), and ["*3*_10:07:event1", "3_10:15"), you can then combine the > > results. > > > > Hope it helps. > > > > On Mon, Jan 28, 2013 at 12:49 PM, Oleg Ruchovets <[email protected] > > >wrote: > > > > > Hi Rodrigo. > > > Can you please explain in more details your solution.You said that I > > will > > > have another table. How many table will I have? Will I have 2 tables? > > What > > > will be the schema of the tables? > > > > > > I try to explain what I try to achive: > > > I have ~50 million records like {time|event}. I want to put the > data > > in > > > Hbase in such way : > > > events of time X and all events what was after event X during time > > > T minutes (for example during 7 minutes). > > > So I will be able to scan all table and get groups like: > > > > > > {event1:10:02} corresponds to events {event2:10:03} , {event3:10:05} > , > > > {event4:10:06} > > > {event2:10:30} correnponds to events {events5:10:32} , > {event3:10:33} , > > > {event3:10:36}. > > > > > > Thanks > > > Oleg. > > > > > > > > > On Mon, Jan 28, 2013 at 5:17 PM, Rodrigo Ribeiro < > > > [email protected]> wrote: > > > > > > > You can use another table as a index, using a rowkey like > > > > '{time}:{event_id}', and then scan in the range ["10:07", "10:15"). > > > > > > > > On Mon, Jan 28, 2013 at 10:06 AM, Oleg Ruchovets < > [email protected] > > > > >wrote: > > > > > > > > > Hi , > > > > > > > > > > I have such row data structure: > > > > > > > > > > event_id | time > > > > > ============= > > > > > event1 | 10:07 > > > > > event2 | 10:10 > > > > > event3 | 10:12 > > > > > > > > > > event4 | 10:20 > > > > > event5 | 10:23 > > > > > event6 | 10:25 > > > > > > > > > > > > > > > Numbers of records is 50-100 million. > > > > > > > > > > > > > > > Question: > > > > > > > > > > I need to find group of events starting form eventX and enters to > the > > > > time > > > > > window bucket = T. > > > > > > > > > > > > > > > For example: if T=7 munutes. > > > > > Starting from event event1- {event1, event2 , event3} were detected > > > > durint > > > > > 7 minutes. > > > > > > > > > > Starting from event event2- {event2 , event3} were detected durint > 7 > > > > > minutes. > > > > > > > > > > Starting from event event4 - {event4, event5 , event6} were > detected > > > > during > > > > > 7 minutes. > > > > > Is there a way to model the data in hbase to get? > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > *Rodrigo Pereira Ribeiro* > > > > Software Developer > > > > www.jusbrasil.com.br > > > > > > > > > > > > > > > -- > > > > *Rodrigo Pereira Ribeiro* > > Software Developer > > T (71) 3033-6371 > > C (71) 8612-5847 > > [email protected] > > www.jusbrasil.com.br > > > -- *Rodrigo Pereira Ribeiro* Software Developer www.jusbrasil.com.br
