Re: Iterating through large numbers of rows with JDBC

2013-05-14 Thread David McNelis
Another thing to keep in mind when doing this with CQL is to take into
account the ordering partitioner you may or may not be using.  If you're
using one you'll need to make sure that if you have a larger number of rows
for the partitioner key than your query limit, then you can end up in a
situation where you're stuck in a loop.


On Tue, May 14, 2013 at 1:39 PM, aaron morton wrote:

> You can iterate over them, just make sure to set a sensible row count to
> chunk things up.
> See
> http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging
>
> You can also break up the processing so only one worker reads the token
> ranges for a node. That allows you to
> process the rows in parallel and avoid workers processing the same rows.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/05/2013, at 2:51 AM, Robert Wille  wrote:
>
> Iterating through lots of records is not a primary use of my data.
> However, there are a number scenarios where scanning the entire contents
> of a column family is an interesting and useful exercise. Here are a few:
> removal of orphaned records, checking the integrity a data set, and
> analytics.
>
> On 5/12/13 3:41 AM, "Oleg Dulin"  wrote:
>
> On 2013-05-11 14:42:32 +, Robert Wille said:
>
> I'm using the JDBC driver to access Cassandra. I'm wondering if its
> possible to iterate through a large number of records (e.g. to perform
> maintenance on a large column family). I tried calling
> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
> cursors aren't supported. Is there another way to do this, or do I need
> to
> use a different API?
>
> Thanks in advance
>
> Robert
>
>
> If you feel that you need to iterate through a large number of rows
> then you are probably not using a correct data model.
>
> Can you describe your use case ?
>
> --
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
>
>
>
>
>
>


Re: Iterating through large numbers of rows with JDBC

2013-05-14 Thread aaron morton
You can iterate over them, just make sure to set a sensible row count to chunk 
things up.
See 
http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging

You can also break up the processing so only one worker reads the token ranges 
for a node. That allows you to 
process the rows in parallel and avoid workers processing the same rows. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/05/2013, at 2:51 AM, Robert Wille  wrote:

> Iterating through lots of records is not a primary use of my data.
> However, there are a number scenarios where scanning the entire contents
> of a column family is an interesting and useful exercise. Here are a few:
> removal of orphaned records, checking the integrity a data set, and
> analytics.
> 
> On 5/12/13 3:41 AM, "Oleg Dulin"  wrote:
> 
>> On 2013-05-11 14:42:32 +, Robert Wille said:
>> 
>>> I'm using the JDBC driver to access Cassandra. I'm wondering if its
>>> possible to iterate through a large number of records (e.g. to perform
>>> maintenance on a large column family). I tried calling
>>> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
>>> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
>>> cursors aren't supported. Is there another way to do this, or do I need
>>> to
>>> use a different API?
>>> 
>>> Thanks in advance
>>> 
>>> Robert
>> 
>> If you feel that you need to iterate through a large number of rows
>> then you are probably not using a correct data model.
>> 
>> Can you describe your use case ?
>> 
>> -- 
>> Regards,
>> Oleg Dulin
>> NYC Java Big Data Engineer
>> http://www.olegdulin.com/
>> 
>> 
> 
> 



Re: Iterating through large numbers of rows with JDBC

2013-05-12 Thread Robert Wille
Iterating through lots of records is not a primary use of my data.
However, there are a number scenarios where scanning the entire contents
of a column family is an interesting and useful exercise. Here are a few:
removal of orphaned records, checking the integrity a data set, and
analytics.

On 5/12/13 3:41 AM, "Oleg Dulin"  wrote:

>On 2013-05-11 14:42:32 +, Robert Wille said:
>
>> I'm using the JDBC driver to access Cassandra. I'm wondering if its
>> possible to iterate through a large number of records (e.g. to perform
>> maintenance on a large column family). I tried calling
>> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
>> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
>> cursors aren't supported. Is there another way to do this, or do I need
>>to
>> use a different API?
>> 
>> Thanks in advance
>> 
>> Robert
>
>If you feel that you need to iterate through a large number of rows
>then you are probably not using a correct data model.
>
>Can you describe your use case ?
>
>-- 
>Regards,
>Oleg Dulin
>NYC Java Big Data Engineer
>http://www.olegdulin.com/
>
>




Re: Iterating through large numbers of rows with JDBC

2013-05-12 Thread Oleg Dulin

On 2013-05-11 14:42:32 +, Robert Wille said:


I'm using the JDBC driver to access Cassandra. I'm wondering if its
possible to iterate through a large number of records (e.g. to perform
maintenance on a large column family). I tried calling
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
cursors aren't supported. Is there another way to do this, or do I need to
use a different API?

Thanks in advance

Robert


If you feel that you need to iterate through a large number of rows 
then you are probably not using a correct data model.


Can you describe your use case ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Iterating through large numbers of rows with JDBC

2013-05-11 Thread Robert Wille
I'm using the JDBC driver to access Cassandra. I'm wondering if its
possible to iterate through a large number of records (e.g. to perform
maintenance on a large column family). I tried calling
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
cursors aren't supported. Is there another way to do this, or do I need to
use a different API?

Thanks in advance

Robert