I had the same issue and overcame it by querying for primary keys over all subset s of the token range/ring.
——— Jens Rantil Backend Engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Thu, Jan 29, 2015 at 10:32 PM, Ravi Agrawal <ragra...@clearpoolgroup.com> wrote: > Select distinct keys from column family; hits a timeout exception. > pk1, pk2,…pkn are 800K in total. > From: Mohammed Guller [mailto:moham...@glassbeam.com] > Sent: Friday, January 23, 2015 3:24 PM > To: user@cassandra.apache.org > Subject: RE: Retrieving all row keys of a CF > No wonder, the client is timing out. Even though C* supports up to 2B > columns, it is recommended not to have more 100k CQL rows in a partition. > It has been a long time since I used Astyanax, so I don’t remember whether > the AllRowsReader reads all CQL rows or storage rows. If it is reading all > CQL rows, then essentially it is trying to read 800k*200k rows. That will be > 160B rows! > Did you try “SELECT DISTINCT …” from cqlsh? > Mohammed > From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] > Sent: Thursday, January 22, 2015 11:12 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: RE: Retrieving all row keys of a CF > In each partition cql rows on average is 200K. Max is 3M. > 800K is number of cassandra partitions. > From: Mohammed Guller [mailto:moham...@glassbeam.com] > Sent: Thursday, January 22, 2015 7:43 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: RE: Retrieving all row keys of a CF > What is the average and max # of CQL rows in each partition? Is 800,000 the > number of CQL rows or Cassandra partitions (storage engine rows)? > Another option you could try is a CQL statement to fetch all partition keys. > You could first try this in the cqlsh: > “SELECT DISTINCT pk1, pk2…pkn FROM CF” > You will need to specify all the composite columns if you are using a > composite partition key. > Mohammed > From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] > Sent: Thursday, January 22, 2015 1:57 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: RE: Retrieving all row keys of a CF > Hi, > I increased range timeout, read timeout to first to 50 secs then 500 secs and > Astyanax client to 60, 550 secs respectively. I still get timeout exception. > I see the logic with .withCheckpointManager() code, is that the only way it > could work? > From: Eric Stevens [mailto:migh...@gmail.com] > Sent: Saturday, January 17, 2015 9:55 AM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: Re: Retrieving all row keys of a CF > If you're getting partial data back, then failing eventually, try setting > .withCheckpointManager() - this will let you keep track of the token ranges > you've successfully processed, and not attempt to reprocess them. This will > also let you set up tasks on bigger data sets that take hours or days to run, > and reasonably safely interrupt it at any time without losing progress. > This is some *very* old code, but I dug this out of a git history. We don't > use Astyanax any longer, but maybe an example implementation will help you. > This is Scala instead of Java, but hopefully you can get the gist. > https://gist.github.com/MightyE/83a79b74f3a69cfa3c4e > If you're timing out talking to your cluster, then I don't recommend using > the cluster to track your checkpoints, but some other data store (maybe just > a flatfile). Again, this is just to give you a sense of what's involved. > On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller > <moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote: > Both total system memory and heap size can’t be 8GB? > The timeout on the Astyanax client should be greater than the timeouts on the > C* nodes, otherwise your client will timeout prematurely. > Also, have you tried increasing the timeout for the range queries to a higher > number? It is not recommended to set them very high, because a lot of other > problems may start happening, but then reading 800,000 partitions is not a > normal operation. > Just as an experimentation, can you set the range timeout to 45 seconds on > each node and the timeout on the Astyanax client to 50 seconds? Restart the > nodes after increasing the timeout and try again. > Mohammed > From: Ravi Agrawal > [mailto:ragra...@clearpoolgroup.com<mailto:ragra...@clearpoolgroup.com>] > Sent: Friday, January 16, 2015 5:11 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: RE: Retrieving all row keys of a CF > 1) What is the heap size and total memory on each node? 8GB, 8GB > 2) How big is the cluster? 4 > 3) What are the read and range timeouts (in cassandra.yaml) on the > C* nodes? 10 secs, 10 secs > 4) What are the timeouts for the Astyanax client? 2 secs > 5) Do you see GC pressure on the C* nodes? How long does GC for > new gen and old gen take? occurs every 5 secs dont see huge gc pressure, <50ms > 6) Does any node crash with OOM error when you try AllRowsReader? > No > From: Mohammed Guller [mailto:moham...@glassbeam.com] > Sent: Friday, January 16, 2015 7:30 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: RE: Retrieving all row keys of a CF > A few questions: > 1) What is the heap size and total memory on each node? > 2) How big is the cluster? > 3) What are the read and range timeouts (in cassandra.yaml) on the C* > nodes? > 4) What are the timeouts for the Astyanax client? > 5) Do you see GC pressure on the C* nodes? How long does GC for new gen > and old gen take? > 6) Does any node crash with OOM error when you try AllRowsReader? > Mohammed > From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] > Sent: Friday, January 16, 2015 4:14 PM > To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Subject: Re: Retrieving all row keys of a CF > Hi, > I and Ruchir tried query using AllRowsReader recipe but had no luck. We are > seeing PoolTimeoutException. > SEVERE: [Thread_1] Error reading RowKeys > com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: > PoolTimeoutException: [host=servername, latency=2003(2003), attempts=4]Timed > out waiting for connection > at > com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231) > at > com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198) > at > com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84) > at > com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117) > at > com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338) > at > com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:397) > at > com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:447) > at > com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:419) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > We did receive a portion of data which changes on every try. We used > following method. > boolean result = new AllRowsReader.Builder<String, String>(keyspace, > CF_STANDARD1) > .withColumnRange(null, null, false, 0) > .withPartitioner(null) // this will use keyspace's partitioner > .forEachRow(new Function<Row<String, String>, Boolean>() { > @Override > public Boolean apply(@Nullable Row<String, String> row) { > // Process the row here ... > return true; > } > }) > .build() > .call(); > Tried setting concurrency level as mentioned in this post > (https://github.com/Netflix/astyanax/issues/411) as well on both astyanax > 1.56.49 and 2.0.0. Still nothing.