Hi Mohammad, You are most welcome to join the discussion. I have never used PageFilter so i don't really have concrete input. I had a look at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results <= page size( my guess: due to multiple RS scans)? If you have used it then maybe you can explain the behaviour?
Thanks, Anil On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[email protected]> wrote: > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound > sane to use PageFilter for both rows and columns and having some additional > logic to handle the 'nth' page logic?It'll help us in both kind of paging. > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari < > [email protected]> > wrote: > > Hi Anil, > > > > I think it really depend on the way you want to use the pagination. > > > > Do you need to be able to jump to page X? Are you ok if you miss a > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > > page indexes are a day old? Do you need to paginate over 300 colums? > > Or just 1? Do you need to always have the exact same number of entries > > in each page? > > > > For my usecase I need to be able to jump to the page X and I don't > > have any content. I have hundred of millions lines. Only the rowkey > > matter for me and I'm fine if sometime I have 50 entries displayed, > > and sometime only 45. So I'm thinking about calculating which row is > > the first one for each page, and store that separatly. Then I just > > need to run the MR daily. > > > > It's not a perfect solution I agree, but this might do the job for me. > > I'm totally open to all other idea which might do the job to. > > > > JM > > > > 2013/1/29, anil gupta <[email protected]>: > >> Yes, your suggested solution only works on RowKey based pagination. It > will > >> fail when you start filtering on the basis of columns. > >> > >> Still, i would say it's comparatively easier to maintain this at > >> Application level rather than creating tables for pagination. > >> > >> What if you have 300 columns in your schema. Will you create 300 tables? > >> What about handling of pagination when filtering is done based on > multiple > >> columns ("and" and "or" conditions)? > >> > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < > >> [email protected]> wrote: > >> > >>> No, no killer solution here ;) > >>> > >>> But I'm still thinking about that because I might have to implement > >>> some pagination options soon... > >>> > >>> As you are saying, it's only working on the row-key, but if you want > >>> to do the same-thing on non-rowkey, you might have to create a > >>> secondary index table... > >>> > >>> JM > >>> > >>> 2013/1/27, anil gupta <[email protected]>: > >>> > That's alright..I thought that you have come-up with a killer > solution. > >>> So, > >>> > got curious to hear your ideas. ;) > >>> > It seems like your below mentioned solution will not work on > filtering > >>> > on > >>> > non row-key columns since when you are deciding the page numbers you > >>> > are > >>> > only considering rowkey. > >>> > > >>> > Thanks, > >>> > Anil > >>> > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < > >>> > [email protected]> wrote: > >>> > > >>> >> Hi Anil, > >>> >> > >>> >> I don't have a solution. I never tought about that ;) But I was > >>> >> thinking about something like you create a 2nd table where you place > >>> >> the raw number (4 bytes) then the raw key. You go directly to a > >>> >> specific page, you query by the number, found the key, and you know > >>> >> where to start you scan in the main table. > >>> >> > >>> >> The issue is properly the number for each lines since with a MR you > >>> >> don't know where you are from the beginning. But you can built > >>> >> something where you store the line number from the beginning of the > >>> >> region, then when all regions are parsed you can reconstruct the > total > >>> >> numbering... That should work... > >>> >> > >>> >> JM > >>> >> > >>> >> 2013/1/25, anil gupta <[email protected]>: > >>> >> > Inline... > >>> >> > > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < > >>> >> > [email protected]> wrote: > >>> >> > > >>> >> >> Hi Anil, > >>> >> >> > >>> >> >> The issue is that all the other sub-sequent page start should be > >>> moved > >>> >> >> too... > >>> >> >> > >>> >> > Yes, this is a possibility. Hence the Developer has to take care > of > >>> >> > this > >>> >> > case. It might also be possible that the pageSize is not a hard > >>> >> > limit > >>> >> > on > >>> >> > number of results(more like a hint or suggestion on size). I would > >>> >> > say > >>> >> > it > >>> >> > varies by use case. > >>> >> > > >>> >> >> > >>> >> >> so if you want to jump directly to page n, you might be totally > >>> >> >> shifted because of all the data inserted in the meantime... > >>> >> >> > >>> >> >> If you want a real complete pagination feature, you might want to > >>> have > >>> >> >> a coproccessor or a MR updating another table refering to the > >>> >> >> pages.... > >>> >> >> > >>> >> > Well, the solution depends on the use case. I will be doing > >>> >> > pagination > > > > -- > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > -- Thanks & Regards, Anil Gupta
