Re: Iterating through large numbers of rows with JDBC
Another thing to keep in mind when doing this with CQL is to take into account the ordering partitioner you may or may not be using. If you're using one you'll need to make sure that if you have a larger number of rows for the partitioner key than your query limit, then you can end up in a situation where you're stuck in a loop. On Tue, May 14, 2013 at 1:39 PM, aaron morton wrote: > You can iterate over them, just make sure to set a sensible row count to > chunk things up. > See > http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging > > You can also break up the processing so only one worker reads the token > ranges for a node. That allows you to > process the rows in parallel and avoid workers processing the same rows. > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 13/05/2013, at 2:51 AM, Robert Wille wrote: > > Iterating through lots of records is not a primary use of my data. > However, there are a number scenarios where scanning the entire contents > of a column family is an interesting and useful exercise. Here are a few: > removal of orphaned records, checking the integrity a data set, and > analytics. > > On 5/12/13 3:41 AM, "Oleg Dulin" wrote: > > On 2013-05-11 14:42:32 +, Robert Wille said: > > I'm using the JDBC driver to access Cassandra. I'm wondering if its > possible to iterate through a large number of records (e.g. to perform > maintenance on a large column family). I tried calling > Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, > ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that > cursors aren't supported. Is there another way to do this, or do I need > to > use a different API? > > Thanks in advance > > Robert > > > If you feel that you need to iterate through a large number of rows > then you are probably not using a correct data model. > > Can you describe your use case ? > > -- > Regards, > Oleg Dulin > NYC Java Big Data Engineer > http://www.olegdulin.com/ > > > > > >
Re: Iterating through large numbers of rows with JDBC
You can iterate over them, just make sure to set a sensible row count to chunk things up. See http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging You can also break up the processing so only one worker reads the token ranges for a node. That allows you to process the rows in parallel and avoid workers processing the same rows. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/05/2013, at 2:51 AM, Robert Wille wrote: > Iterating through lots of records is not a primary use of my data. > However, there are a number scenarios where scanning the entire contents > of a column family is an interesting and useful exercise. Here are a few: > removal of orphaned records, checking the integrity a data set, and > analytics. > > On 5/12/13 3:41 AM, "Oleg Dulin" wrote: > >> On 2013-05-11 14:42:32 +, Robert Wille said: >> >>> I'm using the JDBC driver to access Cassandra. I'm wondering if its >>> possible to iterate through a large number of records (e.g. to perform >>> maintenance on a large column family). I tried calling >>> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, >>> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that >>> cursors aren't supported. Is there another way to do this, or do I need >>> to >>> use a different API? >>> >>> Thanks in advance >>> >>> Robert >> >> If you feel that you need to iterate through a large number of rows >> then you are probably not using a correct data model. >> >> Can you describe your use case ? >> >> -- >> Regards, >> Oleg Dulin >> NYC Java Big Data Engineer >> http://www.olegdulin.com/ >> >> > >
Re: Iterating through large numbers of rows with JDBC
Iterating through lots of records is not a primary use of my data. However, there are a number scenarios where scanning the entire contents of a column family is an interesting and useful exercise. Here are a few: removal of orphaned records, checking the integrity a data set, and analytics. On 5/12/13 3:41 AM, "Oleg Dulin" wrote: >On 2013-05-11 14:42:32 +, Robert Wille said: > >> I'm using the JDBC driver to access Cassandra. I'm wondering if its >> possible to iterate through a large number of records (e.g. to perform >> maintenance on a large column family). I tried calling >> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, >> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that >> cursors aren't supported. Is there another way to do this, or do I need >>to >> use a different API? >> >> Thanks in advance >> >> Robert > >If you feel that you need to iterate through a large number of rows >then you are probably not using a correct data model. > >Can you describe your use case ? > >-- >Regards, >Oleg Dulin >NYC Java Big Data Engineer >http://www.olegdulin.com/ > >
Re: Iterating through large numbers of rows with JDBC
On 2013-05-11 14:42:32 +, Robert Wille said: I'm using the JDBC driver to access Cassandra. I'm wondering if its possible to iterate through a large number of records (e.g. to perform maintenance on a large column family). I tried calling Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that cursors aren't supported. Is there another way to do this, or do I need to use a different API? Thanks in advance Robert If you feel that you need to iterate through a large number of rows then you are probably not using a correct data model. Can you describe your use case ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Iterating through large numbers of rows with JDBC
I'm using the JDBC driver to access Cassandra. I'm wondering if its possible to iterate through a large number of records (e.g. to perform maintenance on a large column family). I tried calling Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that cursors aren't supported. Is there another way to do this, or do I need to use a different API? Thanks in advance Robert