Proper data structure at client side can avoid sorting. For example: https://docs.oracle.com/javase/7/docs/api/java/util/LinkedList.html#addFirst(E)
> On Aug 25, 2016, at 2:45 AM, ramkrishna vasudevan > <[email protected]> wrote: > > And reading thro the mail chain as Ted suggested if you setReversedScan as > True and reverse your stop and start Row you can just do a count in your > Row filter filter till 10k is reached and then just skip all the other > results. > > In the other logic that I had said you may have to do a sort before > returning the collected result. In the reverse scan case too if you need > the result in lexographical order you may need to sort it in the client > side. > > Regards > Ram > > On Thu, Aug 25, 2016 at 3:11 PM, ramkrishna vasudevan < > [email protected]> wrote: > >> Hi Manjeet >> >> For your first question regarding fetching last 1000 records >> >> First in your scan you set your start Row with the bytes corresponding to ( >> A_9811111111_) >> and let the end byte be the byte representation of A_9811111111 + 1 . I >> mean add +1 to the last byte of what comes out of (A_9811111111_). So >> this will ensure you scan only the rows corresponding to (A_9811111111_). >> >> Just thinking the first thing that I can see is that it may be easier to >> do this with CPs than Filters. Because filters deals with per cell or that >> row. Adding the results and maintaing the last 10k records may be >> difficult. I have to see in detail if possible. >> >> Do you know the number of columns you have? If there are multiple columns >> then it is quite tricky. But if you have only one column per row then or >> you want only the row keys >> >> You can implement an User Coprocessor and in that you can implement >> preStoreScannerOpen(). Take for eg. you have only one family so in that >> case in you preStoreScannerOpen you will create your own StoreScanner and >> in the StoreScanner.next() you can >> just skip all KeyValues and during that process keep collecting your >> cells. Ensure you keep collecting the cells row wise by adding to a list. >> You will have to have only the latest 10000 cells in the list any time. >> >> Every time keep checking if the row has reached the stopRow that is set in >> the scan (so may be it moves to A_9811111112_). >> Once you see this condition you may have to replace the list given by the >> StoreScanner.next() call with the list that you have collected and send it >> to the client. >> I have not yet tried it but it can give you an idea with CPs. >> >> With filters am not sure as I said as I need to read the flow and see if >> there are any such APIs to mimic the above. >> >> PS. Don't take this as a working algo. There may be reasons why it may not >> work but you can see and read about CPs to see if something like above can >> work out. >> >> Regards >> Ram >> >> >> >> >> On Thu, Aug 25, 2016 at 2:16 PM, Manjeet Singh <[email protected] >>> wrote: >> >>> Hi All >>> >>> I have one another question for same case >>> >>> below is my sample Hbase data as we all know that hbase store data on the >>> basis of rowkey (sorted) >>> below is IP as we can see 2.168.129.81_1 is in last what I am expecting it >>> shuld come just after 1.168.129.81_2 >>> >>> >>> >>> 1.168.129.81_0 >>> column=c2:D_com.stackoverflow/questions/4, timestamp=1472104396288, >>> value=4 >>> 1.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/1, timestamp=1472104396288, >>> value=1 >>> 1.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/2, timestamp=1472104396288, >>> value=2 >>> 1.168.129.81_2 >>> column=c2:D_com.stackoverflow/questions/0, timestamp=1472104396288, >>> value=0 >>> 192.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/2, timestamp=1472104386671, >>> value=2 >>> 192.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/4, timestamp=1472104386671, >>> value=4 >>> 192.168.129.81_2 >>> column=c2:D_com.stackoverflow/questions/1, timestamp=1472104386671, >>> value=1 >>> 192.168.129.81_3 >>> column=c2:D_com.stackoverflow/questions/0, timestamp=1472104386671, >>> value=0 >>> 192.168.129.81_3 >>> column=c2:D_com.stackoverflow/questions/3, timestamp=1472104386671, >>> value=3 >>> 2.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/0, timestamp=1472104404609, >>> value=0 >>> 2.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/1, timestamp=1472104404609, >>> value=1 >>> 2.168.129.81_1 >>> column=c2:D_com.stackoverflow/questions/2, timestamp=1472104404609, >>> value=2 >>> 2.168.129.81_3 >>> column=c2:D_com.stackoverflow/questions/4, timestamp=1472104404609, >>> value=4 >>> >>> >>> >>> On Thu, Aug 25, 2016 at 12:36 PM, Manjeet Singh < >>> [email protected]> >>> wrote: >>> >>>> I am using some logical salt say I have mobile number in my row key so I >>>> am using some algo and fitting this mobile number into some ASCII char >>>> So each time I know what will be the salt so its clear to me and it will >>>> never change the order >>>> example >>>> if based on my algo I get A for 9811111111 >>>> so each time it will always return me A for 9811111111 >>>> so if I have my row key Like >>>> A_9811111111_101 >>>> A_9811111111_102 >>>> A_9811111111_103 >>>> A_9811111111_104 >>>> A_9811111111_105 >>>> A_9811111111_106 >>>> A_9811111111_107 >>>> A_9811111111_108 >>>> >>>> it will sort my row key in same manner as showing above now these are >>>> millions of record now i want to get last 10000 records >>>> is their any way to get it, my concern is to perform all calcuation on >>>> server side not client side. >>>> >>>> >>>> Thanks >>>> Manjeet >>>> >>>> >>>> On Thu, Aug 25, 2016 at 1:06 AM, Esteban Gutierrez < >>> [email protected]> >>>> wrote: >>>> >>>>> As long as new rows are added to the latest region that "might" work. >>> But >>>>> if the table is using hashed keys or rows are added randomly to the >>> table >>>>> then retrieving the last million will be trickier and you will have to >>>>> scan >>>>> based on timestamp (if not modified) and then filter one more time. >>>>> >>>>> esteban. >>>>> >>>>> >>>>> -- >>>>> Cloudera, Inc. >>>>> >>>>> >>>>>> On Wed, Aug 24, 2016 at 12:31 PM, Ted Yu <[email protected]> wrote: >>>>>> >>>>>> The following API should help in your case: >>>>>> >>>>>> public Scan setReversed(boolean reversed) { >>>>>> >>>>>> Cheers >>>>>> >>>>>> On Wed, Aug 24, 2016 at 12:05 PM, Manjeet Singh < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all >>>>>>> >>>>>>> Hbase didnt provide sorting on column but rowkey store in sorted >>> form >>>>>>> like small value first and greater value last >>>>>>> >>>>>>> example >>>>>>> 1 >>>>>>> 2 >>>>>>> 3 >>>>>>> 4 >>>>>>> 5 >>>>>>> 6 >>>>>>> 7 >>>>>>> and so on >>>>>>> >>>>>>> Assume I have 1 Miilions record but i want to look last 1000 >>> records >>>>> only >>>>>>> Is their any way to do this? I don't want to perform any >>> calculation >>>>> on >>>>>>> client side so may be any filter can help on it? >>>>>>> >>>>>>> Thanks >>>>>>> Manjeet >>>>>>> >>>>>>> -- >>>>>>> luv all >>>> >>>> >>>> >>>> -- >>>> luv all >>> >>> >>> >>> -- >>> luv all >> >>
