I don't know what the shape of the page state data is deep inside the JavaDriver, I've actually tried to dig into that in the past and understand it to see if I could reproduce it as a general purpose any-query kind of thing. I gave up before I fully understood it, but I think it's actually a handle to an in-memory state maintained by the coordinator, which is only maintained for the lifetime of the statement (i.e. it's not stateless paging). That would make it a bad candidate for stateless paging scenarios such as REST requests where a typical setup would load balance across HTTP hosts, never mind across coordinators.
It shouldn't be too much work to abstract this basic idea for manual paging into a general purpose class that takes List[ClusteringKeyDef[T, O<:Ordering]], and can produce a connection agnostic PageState from a ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment. Also RE: possibly multiple queries to satisfy a page - yes, that's unfortunate. Since you're on 2.0.11, see Ondřej's answer to avoid it. On Thu, Feb 12, 2015 at 8:13 AM, Ajay <ajay.ga...@gmail.com> wrote: > Thanks Eric. I figured out the same but didn't get time to put it on the > mail. Thanks. > > But it is highly tied up to how data is stored internally in Cassandra. > Basically how partition keys are used to distribute (less likely to change. > We are not directly dependence on the partition algo) and clustering keys > are used to sort the data with in a partition( multi level sorting and > henceforth the restrictions on the ORDER BY clause) which I think can > change likely down the lane in Cassandra 3.x or 4.x in an different way for > some better storage or retrieval. > > Thats said I am hesitant to implement this client side logic for > pagination for a) 2+ queries might need more than one query to Cassandra. > b) tied up implementation to Cassandra internal storage details which can > change(though not often). c) in our case, we are building REST Apis which > will be deployed Tomcat clusters. Hence whatever we cache to support > pagination, need to be cached in a distributed way for failover support. > > It (pagination support) is best done at the server side like ROWNUM in SQL > or better done in Java driver to hide the internal details and can be > optimized better as server sends the paging state with the driver. > > Thanks > Ajay > On Feb 12, 2015 8:22 PM, "Eric Stevens" <migh...@gmail.com> wrote: > >> Your page state then needs to track the last ck1 and last ck2 you saw. >> Pages 2+ will end up needing to be up to two queries if the first query >> doesn't fill the page size. >> >> CREATE TABLE foo ( >> partitionkey int, >> ck1 int, >> ck2 int, >> col1 int, >> col2 int, >> PRIMARY KEY ((partitionkey), ck1, ck2) >> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc); >> >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1); >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2); >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3); >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4); >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5); >> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6); >> >> If you're pulling the whole of partition 1 and your page size is 2, your >> first page looks like: >> >> *PAGE 1* >> >> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2; >> partitionkey | ck1 | ck2 | col1 | col2 >> --------------+-----+-----+------+------ >> 1 | 1 | 3 | 3 | 3 >> 1 | 1 | 2 | 2 | 2 >> >> You got enough rows to satisfy the page, Your page state is taken from >> the last row: (ck1=1, ck2=2) >> >> >> *PAGE 2* >> Notice that you have a page state, and add some limiting clauses on the >> statement: >> >> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2; >> partitionkey | ck1 | ck2 | col1 | col2 >> --------------+-----+-----+------+------ >> 1 | 1 | 1 | 1 | 1 >> >> Oops, we didn't get enough rows to satisfy the page limit, so we need to >> continue on, we just need one more: >> >> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1; >> partitionkey | ck1 | ck2 | col1 | col2 >> --------------+-----+-----+------+------ >> 1 | 2 | 3 | 6 | 6 >> >> We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 = >> 3). >> >> >> *PAGE 3* >> >> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2; >> partitionkey | ck1 | ck2 | col1 | col2 >> --------------+-----+-----+------+------ >> 1 | 2 | 2 | 5 | 5 >> 1 | 2 | 1 | 4 | 4 >> >> Great, we satisfied this page with only one query, page state: (ck1 = 2, >> ck2 = 1). >> >> >> *PAGE 4* >> >> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2; >> (0 rows) >> >> Oops, our initial query was on the boundary of ck1, but this looks like >> any other time that the initial query returns < pageSize rows, we just move >> on to the next page: >> >> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2; >> (0 rows) >> >> Aha, we've exhausted ck1 as well, so there are no more pages, page 3 >> actually pulled the last possible value; page 4 is empty, and we're all >> done. Generally speaking you know you're done when your first clustering >> key is the only non-equality operator in the statement, and you got no rows >> back. >> >> >> >> >> >> >> On Wed, Feb 11, 2015 at 10:55 AM, Ajay <ajay.ga...@gmail.com> wrote: >> >>> Basically I am trying different queries with your approach. >>> >>> One such query is like >>> >>> Select * from mycf where condition on partition key order by ck1 asc, >>> ck2 desc where ck1 and ck2 are clustering keys in that order. >>> >>> Here how do we achieve pagination support? >>> >>> Thanks >>> Ajay >>> On Feb 11, 2015 11:16 PM, "Ajay" <ajay.ga...@gmail.com> wrote: >>> >>>> >>>> Hi Eric, >>>> >>>> Thanks for your reply. >>>> >>>> I am using Cassandra 2.0.11 and in that I cannot append condition like >>>> last clustering key column > value of the last row in the previous batch. >>>> It fails Preceding column is either not restricted or by a non-EQ relation. >>>> It means I need to specify equal condition for all preceding clustering >>>> key columns. With this I cannot get the pagination correct. >>>> >>>> Thanks >>>> Ajay >>>> > I can't believe that everyone read & process all rows at once >>>> (without pagination). >>>> >>>> Probably not too many people try to read all rows in a table as a >>>> single rolling operation with a standard client driver. But those who do >>>> would use token() to keep track of where they are and be able to resume >>>> with that as well. >>>> >>>> But it sounds like you're talking about paginating a subset of data - >>>> larger than you want to process as a unit, but prefiltered by some other >>>> criteria which prevents you from being able to rely on token(). For this >>>> there is no general purpose solution, but it typically involves you >>>> maintaining your own paging state, typically keeping track of the last >>>> partitioning and clustering key seen, and using that to construct your next >>>> query. >>>> >>>> For example, we have client queries which can span several partitioning >>>> keys. We make sure that the List of partition keys generated by a given >>>> client query List(Pq) is deterministic, then our paging state is the >>>> index offset of the final Pq in the response, plus the value of the >>>> final clustering column. A query coming in with a paging state attached to >>>> it starts the next set of queries from the provided Pq offset where >>>> clusteringKey > the provided value. >>>> >>>> So if you can just track partition key offset (if spanning multiple >>>> partitions), and clustering key offset, you can construct your next query >>>> from those instead. >>>> >>>> On Tue, Feb 10, 2015 at 6:58 PM, Ajay <ajay.ga...@gmail.com> wrote: >>>> >>>>> Thanks Alex. >>>>> >>>>> But is there any workaround possible?. I can't believe that everyone >>>>> read & process all rows at once (without pagination). >>>>> >>>>> Thanks >>>>> Ajay >>>>> On Feb 10, 2015 11:46 PM, "Alex Popescu" <al...@datastax.com> wrote: >>>>> >>>>>> >>>>>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay <ajay.ga...@gmail.com> wrote: >>>>>> >>>>>>> 1) Java driver implicitly support Pagination in the ResultSet (using >>>>>>> Iterator) which can be controlled through FetchSize. But it is limited >>>>>>> in a >>>>>>> way that we cannot skip or go previous. The FetchState is not exposed. >>>>>> >>>>>> >>>>>> Cassandra doesn't support skipping so this is not really a limitation >>>>>> of the driver. >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> [:>-a) >>>>>> >>>>>> Alex Popescu >>>>>> Sen. Product Manager @ DataStax >>>>>> @al3xandru >>>>>> >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to java-driver-user+unsubscr...@lists.datastax.com. >>>>>> >>>>> >>>> >>