No reduction in disk space after delete

2015-03-17 Thread Ravi Agrawal
Hi,
I configured parameter as follows -
Gc_grace_seconds = 1hour.
Tombstone threshold = 1%


1.   I deleted 33% of the existing data but I don't see any change in disk 
space the next day (24 hrs). Column family had 24,000 rows and the number of 
partition keys per row is about 1million. Is there something in Cassandra log 
which could help me understand what's going on?

2.   Also out of curiosity why do I see multiple entries for the same 
rowKey when I query select * from table_name?

Thanks in advance.



Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks




RE: Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Cannot find something corresponding to where clause there.

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Tuesday, February 03, 2015 2:44 PM
To: user@cassandra.apache.org
Subject: RE: Smart column searching for a particular rowKey

Thanks, it does.
How about in astyanax?

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Tuesday, February 03, 2015 1:49 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Smart column searching for a particular rowKey

WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote:
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks





RE: Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Thanks, it does.
How about in astyanax?

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Tuesday, February 03, 2015 1:49 PM
To: user@cassandra.apache.org
Subject: Re: Smart column searching for a particular rowKey

WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote:
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks





RE: Tombstone gc after gc grace seconds

2015-01-30 Thread Ravi Agrawal
I did a small test. I wrote data to 4 different column family. 30MB of data.
256 rowkeys and 100K columns on an average.
And then deleted all data from all of them.


1.   Md_normal - created using default compaction parameters and Gc Grace 
seconds was 5 seconds. Data was written and then deleted. Compaction was ran 
using nodetool compact keyspace columnfamily - I see full disk data, but 
cannot query columns(since data was deleted consistent behavior) and cannot 
query rows in cqlsh. Hits timeout.

2.   Md_test - created using following compact parameters - 
compaction={'tombstone_threshold': '0.01', 'class': 
'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds. Disksize 
is reduced, and am able to query rows which return 0.

3.   Md_test2 - created using following compact parameters - 
compaction={'tombstone_threshold': '0.0', 'class': 
'SizeTieredCompactionStrategy'}. Disksize is reduced, not able to query rows 
using cqlsh. Hits timeout.

4.   Md_forcecompact - created using compaction parameters 
compaction={'unchecked_tombstone_compaction': 'true', 'class': 
'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds. Data was 
written and then deleted. I see full disk data, but cannot query any data using 
mddbreader and cannot query rows in cqlsh. Hits timeout.

Next day sizes were -
30M ./md_forcecompact
4.0K./md_test
304K./md_test2
30M ./md_normal

Feel of the data that we have is -
8000 rowkeys per day and columns are added throughout the day. 300K columns on 
an average per rowKey.



From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Friday, January 30, 2015 4:26 AM
To: user@cassandra.apache.org
Subject: Re: Tombstone gc after gc grace seconds

The point is that all the parts or fragments of the row need to be in the 
SSTables implied in the compaction for C* to be able to evict the row 
effectively.

My understanding of those parameters is that they will trigger a compaction on 
the SSTable that exceed this ratio. This will work properly if you never 
update a row (by modifying a value or adding a column). If your workflow is 
something like Write once per partition key, this parameter will do the job.

If you have fragments, you might trigger this compaction for nothing. In the 
case of frequently updated rows (like when using wide rows / time series) your 
only way to get rid of tombstone is a major compaction.

That's how I understand this.

Hope this help,

C*heers,

Alain

2015-01-30 1:29 GMT+01:00 Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com:
Ravi -

It may help.

What version are you running? Do you know if minor compaction is getting 
triggered at all? One way to check would be see how many sstables the data 
directory has.

Mohammed

From: Ravi Agrawal 
[mailto:ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com]
Sent: Thursday, January 29, 2015 1:29 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Tombstone gc after gc grace seconds

Hi,
I saw there are 2 more interesting parameters -

a.   tombstone_threshold - A ratio of garbage-collectable tombstones to all 
contained columns, which if exceeded by the SSTable triggers compaction (with 
no other SSTables) for the purpose of purging the tombstones. Default value - 
0.2

b.  unchecked_tombstone_compaction - True enables more aggressive than 
normal tombstone compactions. A single SSTable tombstone compaction runs 
without checking the likelihood of success. Cassandra 2.0.9 and later.
Could I use these to get what I want?
Problem I am encountering is even long after gc_grace_seconds I see no 
reduction in disk space until I run compaction manually. I was thinking to make 
tombstone threshold close to 0 and unchecked compaction set to true.
Also we are not running nodetool repair on weekly basis as of now.

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Monday, January 26, 2015 12:11 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Tombstone gc after gc grace seconds

My understanding is consistent with Alain's, there's no way to force a 
tombstone-only compaction, your only option is major compaction.  If you're 
using size tiered, that comes with its own drawbacks.

I wonder if there's a technical limitation that prevents introducing a shadowed 
data cleanup style operation (overwritten data, including deletes, plus 
tombstones past their gc grace period); or maybe even couple it directly with 
cleanup since most of the work (rewriting old SSTables) would be identical.  I 
can't think of something off the top of my head, but it would be so useful that 
it seems like there's got to be something I'm missing.

On Mon, Jan 26, 2015 at 4:15 AM, Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:
I don't think that such a thing exists as SSTables are immutable. You compact 
it entirely or you don't. Minor compaction will eventually evict tombstones

RE: Tombstone gc after gc grace seconds

2015-01-29 Thread Ravi Agrawal
Hi,
I saw there are 2 more interesting parameters –

a.   tombstone_threshold - A ratio of garbage-collectable tombstones to all 
contained columns, which if exceeded by the SSTable triggers compaction (with 
no other SSTables) for the purpose of purging the tombstones. Default value – 
0.2

b.  unchecked_tombstone_compaction - True enables more aggressive than 
normal tombstone compactions. A single SSTable tombstone compaction runs 
without checking the likelihood of success. Cassandra 2.0.9 and later.
Could I use these to get what I want?
Problem I am encountering is even long after gc_grace_seconds I see no 
reduction in disk space until I run compaction manually. I was thinking to make 
tombstone threshold close to 0 and unchecked compaction set to true.
Also we are not running nodetool repair on weekly basis as of now.

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Monday, January 26, 2015 12:11 PM
To: user@cassandra.apache.org
Subject: Re: Tombstone gc after gc grace seconds

My understanding is consistent with Alain's, there's no way to force a 
tombstone-only compaction, your only option is major compaction.  If you're 
using size tiered, that comes with its own drawbacks.

I wonder if there's a technical limitation that prevents introducing a shadowed 
data cleanup style operation (overwritten data, including deletes, plus 
tombstones past their gc grace period); or maybe even couple it directly with 
cleanup since most of the work (rewriting old SSTables) would be identical.  I 
can't think of something off the top of my head, but it would be so useful that 
it seems like there's got to be something I'm missing.

On Mon, Jan 26, 2015 at 4:15 AM, Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:
I don't think that such a thing exists as SSTables are immutable. You compact 
it entirely or you don't. Minor compaction will eventually evict tombstones. If 
it is too slow, AFAIK, the better solution is a major compaction.

C*heers,

Alain

2015-01-23 0:00 GMT+01:00 Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com:
Hi,
I want to trigger just tombstone compaction after gc grace seconds is completed 
not nodetool compact keyspace column family.
Anyway I can do that?

Thanks






RE: Retrieving all row keys of a CF

2015-01-29 Thread Ravi Agrawal
Select distinct keys from column family; hits a timeout exception.
pk1, pk2,…pkn are 800K in total.

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, January 23, 2015 3:24 PM
To: user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

No wonder, the client is timing out. Even though C* supports up to 2B columns, 
it is recommended not to have more 100k CQL rows in a partition.

It has been a long time since I used Astyanax, so I don’t remember whether the 
AllRowsReader reads all CQL rows or storage rows. If it is reading all CQL 
rows, then essentially it is trying to read 800k*200k rows. That will be 160B 
rows!

Did you try “SELECT DISTINCT …” from cqlsh?

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Thursday, January 22, 2015 11:12 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

In each partition cql rows on average is 200K. Max is 3M.
800K is number of cassandra partitions.


From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Thursday, January 22, 2015 7:43 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

What is the average and max # of CQL rows in each partition? Is 800,000 the 
number of CQL rows or Cassandra partitions (storage engine rows)?

Another option you could try is a CQL statement to fetch all partition keys. 
You could first try this in the cqlsh:

“SELECT DISTINCT pk1, pk2…pkn FROM CF”

You will need to specify all the composite columns if you are using a composite 
partition key.

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Thursday, January 22, 2015 1:57 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

Hi,
I increased range timeout, read timeout to first to 50 secs then 500 secs and 
Astyanax client to 60, 550 secs respectively. I still get timeout exception.
I see the logic with .withCheckpointManager() code, is that the only way it 
could work?


From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Saturday, January 17, 2015 9:55 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

If you're getting partial data back, then failing eventually, try setting 
.withCheckpointManager() - this will let you keep track of the token ranges 
you've successfully processed, and not attempt to reprocess them.  This will 
also let you set up tasks on bigger data sets that take hours or days to run, 
and reasonably safely interrupt it at any time without losing progress.

This is some *very* old code, but I dug this out of a git history.  We don't 
use Astyanax any longer, but maybe an example implementation will help you.  
This is Scala instead of Java, but hopefully you can get the gist.

https://gist.github.com/MightyE/83a79b74f3a69cfa3c4e

If you're timing out talking to your cluster, then I don't recommend using the 
cluster to track your checkpoints, but some other data store (maybe just a 
flatfile).  Again, this is just to give you a sense of what's involved.

On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Both total system memory and heap size can’t be 8GB?

The timeout on the Astyanax client should be greater than the timeouts on the 
C* nodes, otherwise your client will timeout prematurely.

Also, have you tried increasing the timeout for the range queries to a higher 
number? It is not recommended to set them very high, because a lot of other 
problems may start happening, but then reading 800,000 partitions is not a 
normal operation.

Just as an experimentation, can you set the range timeout to 45 seconds on each 
node and the timeout on the Astyanax client to 50 seconds? Restart the nodes 
after increasing the timeout and try again.

Mohammed

From: Ravi Agrawal 
[mailto:ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 5:11 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF


1)What is the heap size and total memory on each node? 8GB, 8GB
2)How big is the cluster? 4
3)What are the read and range timeouts (in cassandra.yaml) on the 
C* nodes? 10 secs, 10 secs
4)What are the timeouts for the Astyanax client? 2 secs
5)Do you see GC pressure on the C* nodes? How long does GC for new 
gen and old gen take? occurs every 5 secs dont see huge gc pressure, 50ms
6)Does any node crash with OOM error when you try AllRowsReader? No

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, January 16, 2015 7:30 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

A few questions:


1)  What is the heap size and total memory on each node?

2)  How

RE: Retrieving all row keys of a CF

2015-01-22 Thread Ravi Agrawal
Hi,
I increased range timeout, read timeout to first to 50 secs then 500 secs and 
Astyanax client to 60, 550 secs respectively. I still get timeout exception.
I see the logic with .withCheckpointManager() code, is that the only way it 
could work?


From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Saturday, January 17, 2015 9:55 AM
To: user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

If you're getting partial data back, then failing eventually, try setting 
.withCheckpointManager() - this will let you keep track of the token ranges 
you've successfully processed, and not attempt to reprocess them.  This will 
also let you set up tasks on bigger data sets that take hours or days to run, 
and reasonably safely interrupt it at any time without losing progress.

This is some *very* old code, but I dug this out of a git history.  We don't 
use Astyanax any longer, but maybe an example implementation will help you.  
This is Scala instead of Java, but hopefully you can get the gist.

https://gist.github.com/MightyE/83a79b74f3a69cfa3c4e

If you're timing out talking to your cluster, then I don't recommend using the 
cluster to track your checkpoints, but some other data store (maybe just a 
flatfile).  Again, this is just to give you a sense of what's involved.

On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Both total system memory and heap size can’t be 8GB?

The timeout on the Astyanax client should be greater than the timeouts on the 
C* nodes, otherwise your client will timeout prematurely.

Also, have you tried increasing the timeout for the range queries to a higher 
number? It is not recommended to set them very high, because a lot of other 
problems may start happening, but then reading 800,000 partitions is not a 
normal operation.

Just as an experimentation, can you set the range timeout to 45 seconds on each 
node and the timeout on the Astyanax client to 50 seconds? Restart the nodes 
after increasing the timeout and try again.

Mohammed

From: Ravi Agrawal 
[mailto:ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 5:11 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF


1)What is the heap size and total memory on each node? 8GB, 8GB
2)How big is the cluster? 4
3)What are the read and range timeouts (in cassandra.yaml) on the 
C* nodes? 10 secs, 10 secs
4)What are the timeouts for the Astyanax client? 2 secs
5)Do you see GC pressure on the C* nodes? How long does GC for new 
gen and old gen take? occurs every 5 secs dont see huge gc pressure, 50ms
6)Does any node crash with OOM error when you try AllRowsReader? No

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, January 16, 2015 7:30 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

A few questions:


1)  What is the heap size and total memory on each node?

2)  How big is the cluster?

3)  What are the read and range timeouts (in cassandra.yaml) on the C* 
nodes?

4)  What are the timeouts for the Astyanax client?

5)  Do you see GC pressure on the C* nodes? How long does GC for new gen 
and old gen take?

6)  Does any node crash with OOM error when you try AllRowsReader?

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 4:14 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

Hi,
I and Ruchir tried query using AllRowsReader recipe but had no luck. We are 
seeing PoolTimeoutException.
SEVERE: [Thread_1] Error reading RowKeys
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: 
PoolTimeoutException: [host=servername, latency=2003(2003), attempts=4]Timed 
out waiting for connection
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
   at 
com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
   at 
com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:397)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:447)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:419

Tombstone gc after gc grace seconds

2015-01-22 Thread Ravi Agrawal
Hi,
I want to trigger just tombstone compaction after gc grace seconds is completed 
not nodetool compact keyspace column family.
Anyway I can do that?

Thanks




RE: Retrieving all row keys of a CF

2015-01-22 Thread Ravi Agrawal
In each partition cql rows on average is 200K. Max is 3M.
800K is number of cassandra partitions.


From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Thursday, January 22, 2015 7:43 PM
To: user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

What is the average and max # of CQL rows in each partition? Is 800,000 the 
number of CQL rows or Cassandra partitions (storage engine rows)?

Another option you could try is a CQL statement to fetch all partition keys. 
You could first try this in the cqlsh:

“SELECT DISTINCT pk1, pk2…pkn FROM CF”

You will need to specify all the composite columns if you are using a composite 
partition key.

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Thursday, January 22, 2015 1:57 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

Hi,
I increased range timeout, read timeout to first to 50 secs then 500 secs and 
Astyanax client to 60, 550 secs respectively. I still get timeout exception.
I see the logic with .withCheckpointManager() code, is that the only way it 
could work?


From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Saturday, January 17, 2015 9:55 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

If you're getting partial data back, then failing eventually, try setting 
.withCheckpointManager() - this will let you keep track of the token ranges 
you've successfully processed, and not attempt to reprocess them.  This will 
also let you set up tasks on bigger data sets that take hours or days to run, 
and reasonably safely interrupt it at any time without losing progress.

This is some *very* old code, but I dug this out of a git history.  We don't 
use Astyanax any longer, but maybe an example implementation will help you.  
This is Scala instead of Java, but hopefully you can get the gist.

https://gist.github.com/MightyE/83a79b74f3a69cfa3c4e

If you're timing out talking to your cluster, then I don't recommend using the 
cluster to track your checkpoints, but some other data store (maybe just a 
flatfile).  Again, this is just to give you a sense of what's involved.

On Fri, Jan 16, 2015 at 6:31 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Both total system memory and heap size can’t be 8GB?

The timeout on the Astyanax client should be greater than the timeouts on the 
C* nodes, otherwise your client will timeout prematurely.

Also, have you tried increasing the timeout for the range queries to a higher 
number? It is not recommended to set them very high, because a lot of other 
problems may start happening, but then reading 800,000 partitions is not a 
normal operation.

Just as an experimentation, can you set the range timeout to 45 seconds on each 
node and the timeout on the Astyanax client to 50 seconds? Restart the nodes 
after increasing the timeout and try again.

Mohammed

From: Ravi Agrawal 
[mailto:ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 5:11 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF


1)What is the heap size and total memory on each node? 8GB, 8GB
2)How big is the cluster? 4
3)What are the read and range timeouts (in cassandra.yaml) on the 
C* nodes? 10 secs, 10 secs
4)What are the timeouts for the Astyanax client? 2 secs
5)Do you see GC pressure on the C* nodes? How long does GC for new 
gen and old gen take? occurs every 5 secs dont see huge gc pressure, 50ms
6)Does any node crash with OOM error when you try AllRowsReader? No

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, January 16, 2015 7:30 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

A few questions:


1)  What is the heap size and total memory on each node?

2)  How big is the cluster?

3)  What are the read and range timeouts (in cassandra.yaml) on the C* 
nodes?

4)  What are the timeouts for the Astyanax client?

5)  Do you see GC pressure on the C* nodes? How long does GC for new gen 
and old gen take?

6)  Does any node crash with OOM error when you try AllRowsReader?

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 4:14 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

Hi,
I and Ruchir tried query using AllRowsReader recipe but had no luck. We are 
seeing PoolTimeoutException.
SEVERE: [Thread_1] Error reading RowKeys
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: 
PoolTimeoutException: [host=servername, latency=2003(2003), attempts=4]Timed 
out waiting for connection

Re: Retrieving all row keys of a CF

2015-01-16 Thread Ravi Agrawal
Hi,
I and Ruchir tried query using AllRowsReader recipe but had no luck. We are 
seeing PoolTimeoutException.
SEVERE: [Thread_1] Error reading RowKeys
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: 
PoolTimeoutException: [host=servername, latency=2003(2003), attempts=4]Timed 
out waiting for connection
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
   at 
com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
   at 
com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:397)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:447)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:419)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)

We did receive a portion of data which changes on every try. We used following 
method.
boolean result = new AllRowsReader.BuilderString, String(keyspace, 
CF_STANDARD1)
.withColumnRange(null, null, false, 0)
.withPartitioner(null) // this will use keyspace's partitioner
.forEachRow(new FunctionRowString, String, Boolean() {
@Override
public Boolean apply(@Nullable RowString, String row) {
// Process the row here ...
return true;
}
})
.build()
.call();

Tried setting concurrency level as mentioned in this post 
(https://github.com/Netflix/astyanax/issues/411) as well on both astyanax 
1.56.49 and 2.0.0. Still nothing.


RE: Retrieving all row keys of a CF

2015-01-16 Thread Ravi Agrawal

1)What is the heap size and total memory on each node? 8GB, 8GB
2)How big is the cluster? 4
3)What are the read and range timeouts (in cassandra.yaml) on the 
C* nodes? 10 secs, 10 secs
4)What are the timeouts for the Astyanax client? 2 secs
5)Do you see GC pressure on the C* nodes? How long does GC for new 
gen and old gen take? occurs every 5 secs dont see huge gc pressure, 50ms
6)Does any node crash with OOM error when you try AllRowsReader? No

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Friday, January 16, 2015 7:30 PM
To: user@cassandra.apache.org
Subject: RE: Retrieving all row keys of a CF

A few questions:


1)  What is the heap size and total memory on each node?

2)  How big is the cluster?

3)  What are the read and range timeouts (in cassandra.yaml) on the C* 
nodes?

4)  What are the timeouts for the Astyanax client?

5)  Do you see GC pressure on the C* nodes? How long does GC for new gen 
and old gen take?

6)  Does any node crash with OOM error when you try AllRowsReader?

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Friday, January 16, 2015 4:14 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Retrieving all row keys of a CF

Hi,
I and Ruchir tried query using AllRowsReader recipe but had no luck. We are 
seeing PoolTimeoutException.
SEVERE: [Thread_1] Error reading RowKeys
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: 
PoolTimeoutException: [host=servername, latency=2003(2003), attempts=4]Timed 
out waiting for connection
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
   at 
com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
   at 
com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
   at 
com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
   at 
com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:397)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:447)
   at 
com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:419)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)

We did receive a portion of data which changes on every try. We used following 
method.
boolean result = new AllRowsReader.BuilderString, String(keyspace, 
CF_STANDARD1)
.withColumnRange(null, null, false, 0)
.withPartitioner(null) // this will use keyspace's partitioner
.forEachRow(new FunctionRowString, String, Boolean() {
@Override
public Boolean apply(@Nullable RowString, String row) {
// Process the row here ...
return true;
}
})
.build()
.call();

Tried setting concurrency level as mentioned in this post 
(https://github.com/Netflix/astyanax/issues/411) as well on both astyanax 
1.56.49 and 2.0.0. Still nothing.