Re: Cassandra 5.0 Beta1 - vector searching results

2024-03-27 Thread Caleb Rackliffe
> For your #1 - if there are going to be 100+ million vectors, wouldn't I
want the search to go across nodes?

If you have a replication factor of 3 and 3 nodes, every node will have a
complete copy of the data, so you'd only need to talk to one node. If your
replication factor is 1, you'd have to talk to all three nodes.

On Wed, Mar 27, 2024 at 9:06 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thank you all for the details on this.
> For your #1 - if there are going to be 100+ million vectors, wouldn't I
> want the search to go across nodes?
>
> Right now, we're running both weaviate (8 node cluster), our main
> cassandra 4 cluster (12 nodes), and a test 3 node cassandra 5 cluster.
> Weaviate does some interesting things like product quantization to reduce
> size and improve search speed.  They get amazing speed, but the drawback
> is, from what I can tell, they load the entire index into RAM.  We've been
> having a reoccurring issue where once it runs out of RAM, it doesn't get
> slow; it just stops working.  Weaviate enables some powerful
> vector+boolean+range queries.  I would love to only have one database!
>
> I'll look into how to do profiling - the terms you use are things I'm not
> familiar with, but I've got chatGPT and google... :)
>
> -Joe
> On 3/21/2024 10:51 PM, Caleb Rackliffe wrote:
>
> To expand on Jonathan’s response, the best way to get SAI to perform on
> the read side is to use it as a tool for large-partition search. In other
> words, if you can model your data such that your queries will be restricted
> to a single partition, two things will happen…
>
> 1.) With all queries (not just ANN queries), you will only hit as many
> nodes as your read consistency level and replication factor require. For
> vector searches, that means you should only hit one node, and it should be
> the coordinating node w/ a properly configured, token-aware client.
>
> 2.) You can use LCS (or UCS configured to mimic LCS) instead of STCS as
> your table compaction strategy. This will essentially guarantee your
> (partition-restricted) SAI query hits a small number of SSTable-attached
> indexes. (It’ll hit Memtable-attached indexes as well for any recently
> added data, so if you’re seeing latencies shoot up, it’s possible there
> could be contention on the Memtable-attached index that supports ANN
> queries. I haven’t done a deep dive on it. You can always flush Memtables
> directly before queries to factor that out.)
>
> If you can do all of the above, the simple performance of the local index
> query and its post-filtering reads is probably the place to explore
> further. If you manage to collect any profiling data (JFR, flamegraphs via
> async-profiler, etc) I’d be happy to dig into it with you.
>
> Thanks for kicking the tires!
>
> On Mar 21, 2024, at 8:20 PM, Brebner, Paul via user
>   wrote:
>
> 
>
> Hi Joe,
>
>
>
> Have you considered submitting something for Community Over Code NA 2024?
> The CFP is still open for a few more weeks, options could be my Performance
> Engineering track or the Cassandra track – or both 
>
>
>
>
> https://www.linkedin.com/pulse/cfp-community-over-code-na-denver-2024-performance-track-paul-brebner-nagmc/?trackingId=PlmmMjMeQby0Mozq8cnIpA%3D%3D
>
>
>
> Regards, Paul Brebner
>
>
>
>
>
>
>
> *From: *Joe Obernberger 
> 
> *Date: *Friday, 22 March 2024 at 3:19 am
> *To: *user@cassandra.apache.org 
> 
> *Subject: *Cassandra 5.0 Beta1 - vector searching results
>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
>
> Hi All - I'd like to share some initial results for the vector search on
> Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp
> storage.
>
> Have a table (doc.embeddings_googleflan5tlarge) with definition:
>
> CREATE TABLE doc.embeddings_googleflant5large (
>  uuid text,
>  type text,
>  fieldname text,
>  offset int,
>  sourceurl text,
>  textdata text,
>  creationdate timestamp,
>  embeddings vector,
>  metadata boolean,
>  source text,
>  PRIMARY KEY ((uuid, type), fieldname, offset, sourceurl, textdata)
> ) WITH CLUSTERING ORDER BY (fieldname ASC, offset ASC, sourceurl ASC,
> textdata ASC)
>  AND additional_write_policy = '99p'
>  AND allow_auto_snapshot = true
>  AND bloom_filter_fp_chance = 0.01
>  AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>  AND cdc = false
>  AND comment = ''
>  AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>  AND compression = {'chunk_length_in_k

Re: Cassandra 5.0 Beta1 - vector searching results

2024-03-21 Thread Caleb Rackliffe
To expand on Jonathan’s response, the best way to get SAI to perform on the read side is to use it as a tool for large-partition search. In other words, if you can model your data such that your queries will be restricted to a single partition, two things will happen…1.) With all queries (not just ANN queries), you will only hit as many nodes as your read consistency level and replication factor require. For vector searches, that means you should only hit one node, and it should be the coordinating node w/ a properly configured, token-aware client.2.) You can use LCS (or UCS configured to mimic LCS) instead of STCS as your table compaction strategy. This will essentially guarantee your (partition-restricted) SAI query hits a small number of SSTable-attached indexes. (It’ll hit Memtable-attached indexes as well for any recently added data, so if you’re seeing latencies shoot up, it’s possible there could be contention on the Memtable-attached index that supports ANN queries. I haven’t done a deep dive on it. You can always flush Memtables directly before queries to factor that out.)If you can do all of the above, the simple performance of the local index query and its post-filtering reads is probably the place to explore further. If you manage to collect any profiling data (JFR, flamegraphs via async-profiler, etc) I’d be happy to dig into it with you.Thanks for kicking the tires!On Mar 21, 2024, at 8:20 PM, Brebner, Paul via user  wrote:







Hi Joe,
 
Have you considered submitting something for Community Over Code NA 2024? The CFP is still open for a few more weeks, options could be my Performance Engineering track or the Cassandra
 track – or both 

 
https://www.linkedin.com/pulse/cfp-community-over-code-na-denver-2024-performance-track-paul-brebner-nagmc/?trackingId=PlmmMjMeQby0Mozq8cnIpA%3D%3D
 
Regards, Paul Brebner
 
 
 



From:
Joe Obernberger 
Date: Friday, 22 March 2024 at 3:19 am
To: user@cassandra.apache.org 
Subject: Cassandra 5.0 Beta1 - vector searching results


EXTERNAL EMAIL - USE CAUTION when clicking links or attachments




Hi All - I'd like to share some initial results for the vector search on
Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp
storage.

Have a table (doc.embeddings_googleflan5tlarge) with definition:

CREATE TABLE doc.embeddings_googleflant5large (
 uuid text,
 type text,
 fieldname text,
 offset int,
 sourceurl text,
 textdata text,
 creationdate timestamp,
 embeddings vector,
 metadata boolean,
 source text,
 PRIMARY KEY ((uuid, type), fieldname, offset, sourceurl, textdata)
) WITH CLUSTERING ORDER BY (fieldname ASC, offset ASC, sourceurl ASC,
textdata ASC)
 AND additional_write_policy = '99p'
 AND allow_auto_snapshot = true
 AND bloom_filter_fp_chance = 0.01
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND cdc = false
 AND comment = ''
 AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
 AND compression = {'chunk_length_in_kb': '16', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND memtable = 'default'
 AND crc_check_chance = 1.0
 AND default_time_to_live = 0
 AND extensions = {}
 AND gc_grace_seconds = 864000
 AND incremental_backups = true
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair = 'BLOCKING'
 AND speculative_retry = '99p';

CREATE CUSTOM INDEX ann_index_googleflant5large ON
doc.embeddings_googleflant5large (embeddings) USING 'sai';
CREATE CUSTOM INDEX offset_index_googleflant5large ON
doc.embeddings_googleflant5large (offset) USING 'sai';

nodetool status -r

UN  cassandra-1.cassandra5.cassandra5-jos.svc.cluster.local 18.02 GiB
128 100.0% f2989dea-908b-4c06-9caa-4aacad8ba0e8  rack1
UN  cassandra-2.cassandra5.cassandra5-jos.svc.cluster.local  17.98 GiB
128 100.0% ec4e506d-5f0d-475a-a3c1-aafe58399412  rack1
UN  cassandra-0.cassandra5.cassandra5-jos.svc.cluster.local  18.16 GiB
128 100.0% 92c6d909-ee01-4124-ae03-3b9e2d5e74c0  rack1

nodetool tablestats doc.embeddings_googleflant5large

Total number of tables: 1

Keyspace: doc
 Read Count: 0
 Read Latency: NaN ms
 Write Count: 2893108
 Write Latency: 326.3586520174843 ms
 Pending Flushes: 0
 Table: embeddings_googleflant5large
 SSTable count: 6
 Old SSTable count: 0
 Max SSTable size: 5.108GiB
 Space used (live): 19318114423
 Space used (total): 19318114423
 Space used by snapshots (total): 0
 Off heap memory used (total): 4874912
 SSTable Compression Ratio: 0.97448
 Number of partitions (estimate): 58399
 Memtable cell count: 0

Re: Token Ring Gaps in a 2 DC Setup

2012-03-23 Thread Caleb Rackliffe
Yup, all repairs are complete.  I'm reading at a CL of ONE pretty much 
everywhere.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:47487E9A-F738-4BAE-9A15-E6824E9D1834]

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 20 Mar 2012 13:15:27 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Token Ring Gaps in a 2 DC Setup

mmm, has repair completed on all nodes ?

Also, while it was digging around, I noticed that we do a LOT of reads 
immediately after writes, and almost every read from the first DC was bringing 
a read-repair along with it.
What CL are you using ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/03/2012, at 7:39 AM, Caleb Rackliffe wrote:

Hey Aaron,

I've run cleanup jobs across all 15 nodes, and after that, I still have about a 
24 million to 15 million key ratio between the data centers.  The first DC is a 
few months older than the second, and it also began its life before 1.0.7 was 
out, whereas the second started at 1.0.7.  I wonder if running and 
upgradesstables would be interesting?

Also, while it was digging around, I noticed that we do a LOT of reads 
immediately after writes, and almost every read from the first DC was bringing 
a read-repair along with it.  (Possibly because the distant DC had not yet 
received certain mutations?)  I ended up turning RR off entirely, since I've 
got HH in place to handle short-duration failures :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com
EB2FF764-478C-4966-9B0A-E7B76D6AD7DC[21].png

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Mon, 19 Mar 2012 13:34:38 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Token Ring Gaps in a 2 DC Setup

 I've also run repair on a few nodes in both data centers, but the sizes are 
still vastly different.
If repair is completing on all the nodes then the data is fully distributed.

If you want to dig around…

Take a look at the data files on disk. Do the nodes in DC 1 have some larger, 
older, data files ? These may be waiting for compaction to catch up them.

If you have done any toke moves, did you run cleanup afterwards ?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 18/03/2012, at 8:35 PM, Caleb Rackliffe wrote:

More detail…

I'm running 1.0.7 on these boxes, and the keyspace readout from the CLI looks 
like this:

create keyspace Users
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 2}
  and durable_writes = true;

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com

From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Date: Sun, 18 Mar 2012 02:47:05 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Token Ring Gaps in a 2 DC Setup

Hi Everyone,

I have a cluster using NetworkTopologyStrategy that looks like this:

10.41.116.22 DC1 RAC1 Up Normal  13.21 GB10.00% 
 0
10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
0.00%   1
10.41.116.20 DC1 RAC2 Up Normal  12.75 GB10.00% 
 1701411830
10.41.116.16 DC1 RAC3 Up Normal  12.62 GB10.00% 
 3402823670
10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
0.00%   3402823671
10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
10.00%  5104235500
10.41.116.14 DC1 RAC5 Up Normal  10.27 GB10.00% 
 6805647340
10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
0.00%   6805647341
10.41.116.12 DC1 RAC6 Up Normal  10.58 GB10.00% 
 8507059170
10.41.116.10 DC1 RAC7 Up Normal  10.89 GB10.00% 
 10208471000
10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   0.00% 
  10208471001
10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
10.00

Re: Token Ring Gaps in a 2 DC Setup

2012-03-19 Thread Caleb Rackliffe
Hey Aaron,

I've run cleanup jobs across all 15 nodes, and after that, I still have about a 
24 million to 15 million key ratio between the data centers.  The first DC is a 
few months older than the second, and it also began its life before 1.0.7 was 
out, whereas the second started at 1.0.7.  I wonder if running and 
upgradesstables would be interesting?

Also, while it was digging around, I noticed that we do a LOT of reads 
immediately after writes, and almost every read from the first DC was bringing 
a read-repair along with it.  (Possibly because the distant DC had not yet 
received certain mutations?)  I ended up turning RR off entirely, since I've 
got HH in place to handle short-duration failures :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:CA735F54-7FB9-4D56-8DD6-944F62768556]

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Mon, 19 Mar 2012 13:34:38 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Token Ring Gaps in a 2 DC Setup

 I've also run repair on a few nodes in both data centers, but the sizes are 
still vastly different.
If repair is completing on all the nodes then the data is fully distributed.

If you want to dig around…

Take a look at the data files on disk. Do the nodes in DC 1 have some larger, 
older, data files ? These may be waiting for compaction to catch up them.

If you have done any toke moves, did you run cleanup afterwards ?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/03/2012, at 8:35 PM, Caleb Rackliffe wrote:

More detail…

I'm running 1.0.7 on these boxes, and the keyspace readout from the CLI looks 
like this:

create keyspace Users
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 2}
  and durable_writes = true;

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com

From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Date: Sun, 18 Mar 2012 02:47:05 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Token Ring Gaps in a 2 DC Setup

Hi Everyone,

I have a cluster using NetworkTopologyStrategy that looks like this:

10.41.116.22 DC1 RAC1 Up Normal  13.21 GB10.00% 
 0
10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
0.00%   1
10.41.116.20 DC1 RAC2 Up Normal  12.75 GB10.00% 
 1701411830
10.41.116.16 DC1 RAC3 Up Normal  12.62 GB10.00% 
 3402823670
10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
0.00%   3402823671
10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
10.00%  5104235500
10.41.116.14 DC1 RAC5 Up Normal  10.27 GB10.00% 
 6805647340
10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
0.00%   6805647341
10.41.116.12 DC1 RAC6 Up Normal  10.58 GB10.00% 
 8507059170
10.41.116.10 DC1 RAC7 Up Normal  10.89 GB10.00% 
 10208471000
10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   0.00% 
  10208471001
10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
10.00%  11909882800
10.41.116.24 DC1 RAC9 Up Normal  10.89 GB10.00% 
 13611294700
10.54.149.206   DC2 RAC1 Up Normal  6.37 GB   0.00% 
  13611294701
10.41.116.26 DC1 RAC10   Up Normal  11.17 GB10.00%  
15312706500

There are two data centers, one with 10 nodes/2 replicas and one with 5 nodes/1 
replica.  What I've attempted to do with my token assignments is have each node 
in the smaller DC handle 20% of the keyspace, and this would mean that I should 
see roughly equal usage on all 15 boxes.  It just doesn't seem to be happening 
that way, though.  It looks like the 1 replica nodes are carrying about half 
the data the 2 replica nodes are.  It's almost as if those nodes are only 
handling 10% of the keyspace instead of 20%.

Does anybody have any suggestions as to what might be going on?  I've run 
nodetool getendpoints against a bunch of keys, and I always get back three 
nodes, so I'm

Token Ring Gaps in a 2 DC Setup

2012-03-18 Thread Caleb Rackliffe
Hi Everyone,

I have a cluster using NetworkTopologyStrategy that looks like this:

10.41.116.22 DC1 RAC1 Up Normal  13.21 GB10.00% 
 0
10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
0.00%   1
10.41.116.20 DC1 RAC2 Up Normal  12.75 GB10.00% 
 1701411830
10.41.116.16 DC1 RAC3 Up Normal  12.62 GB10.00% 
 3402823670
10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
0.00%   3402823671
10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
10.00%  5104235500
10.41.116.14 DC1 RAC5 Up Normal  10.27 GB10.00% 
 6805647340
10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
0.00%   6805647341
10.41.116.12 DC1 RAC6 Up Normal  10.58 GB10.00% 
 8507059170
10.41.116.10 DC1 RAC7 Up Normal  10.89 GB10.00% 
 10208471000
10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   0.00% 
  10208471001
10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
10.00%  11909882800
10.41.116.24 DC1 RAC9 Up Normal  10.89 GB10.00% 
 13611294700
10.54.149.206   DC2 RAC1 Up Normal  6.37 GB   0.00% 
  13611294701
10.41.116.26 DC1 RAC10   Up Normal  11.17 GB10.00%  
15312706500

There are two data centers, one with 10 nodes/2 replicas and one with 5 nodes/1 
replica.  What I've attempted to do with my token assignments is have each node 
in the smaller DC handle 20% of the keyspace, and this would mean that I should 
see roughly equal usage on all 15 boxes.  It just doesn't seem to be happening 
that way, though.  It looks like the 1 replica nodes are carrying about half 
the data the 2 replica nodes are.  It's almost as if those nodes are only 
handling 10% of the keyspace instead of 20%.

Does anybody have any suggestions as to what might be going on?  I've run 
nodetool getendpoints against a bunch of keys, and I always get back three 
nodes, so I'm pretty confused.  I've also run repair on a few nodes in both 
data centers, but the sizes are still vastly different.

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


Re: consistency level question

2012-03-18 Thread Caleb Rackliffe
If your replication factor is set to one, your cluster is obviously in a bad 
state following any node failure.  At best, I think it would make sense that 
about a third of your operations fail, but I'm not sure why all of them would.  
I don't know if Hector just refuses to work with a compromised cluster, etc.

I guess I'm wondering why your replication factor is set to 1…

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:3CCB142F-1DF5-423E-BF0E-4D8F3F31E15B]

From: Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 18 Mar 2012 03:15:53 -0400
To: 
cassandra-u...@incubator.apache.orgmailto:cassandra-u...@incubator.apache.org
 
cassandra-u...@incubator.apache.orgmailto:cassandra-u...@incubator.apache.org
Subject: consistency level question

Hi!
I have a 3 node cassandra cluster.
I use Hector API.

I give hecotr one of the node's IP address
I call setAutoDiscoverHosts(true) and setRunAutoDiscoveryAtStartup(true).

The describe on one node returns:

Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:1]

The odd thing is that when I take one of the nodes down, expecting all to 
continue running smoothly, I get exceptions of the format seen bellow, and no 
read or write succeeds. When I bring the node back up, exceptions stop and read 
and write resumes.

Any idea or explanation why this is the case?
Thanks!


me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.
at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:66)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268)
at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
at 
me.prettyprint.cassandra.service.ColumnSliceIterator.hasNext(ColumnSliceIterator.java:60)
at


Tamar Fraenkel
Senior Software Engineer, TOK Media

[cid:ii_135b91fb888fa9ff]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



inline: tokLogo.pnginline: EB2FF764-478C-4966-9B0A-E7B76D6AD7DC[15].png

Re: Token Ring Gaps in a 2 DC Setup

2012-03-18 Thread Caleb Rackliffe
More detail…

I'm running 1.0.7 on these boxes, and the keyspace readout from the CLI looks 
like this:

create keyspace Users
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 2}
  and durable_writes = true;

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Date: Sun, 18 Mar 2012 02:47:05 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Token Ring Gaps in a 2 DC Setup

Hi Everyone,

I have a cluster using NetworkTopologyStrategy that looks like this:

10.41.116.22 DC1 RAC1 Up Normal  13.21 GB10.00% 
 0
10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
0.00%   1
10.41.116.20 DC1 RAC2 Up Normal  12.75 GB10.00% 
 1701411830
10.41.116.16 DC1 RAC3 Up Normal  12.62 GB10.00% 
 3402823670
10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
0.00%   3402823671
10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
10.00%  5104235500
10.41.116.14 DC1 RAC5 Up Normal  10.27 GB10.00% 
 6805647340
10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
0.00%   6805647341
10.41.116.12 DC1 RAC6 Up Normal  10.58 GB10.00% 
 8507059170
10.41.116.10 DC1 RAC7 Up Normal  10.89 GB10.00% 
 10208471000
10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   0.00% 
  10208471001
10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
10.00%  11909882800
10.41.116.24 DC1 RAC9 Up Normal  10.89 GB10.00% 
 13611294700
10.54.149.206   DC2 RAC1 Up Normal  6.37 GB   0.00% 
  13611294701
10.41.116.26 DC1 RAC10   Up Normal  11.17 GB10.00%  
15312706500

There are two data centers, one with 10 nodes/2 replicas and one with 5 nodes/1 
replica.  What I've attempted to do with my token assignments is have each node 
in the smaller DC handle 20% of the keyspace, and this would mean that I should 
see roughly equal usage on all 15 boxes.  It just doesn't seem to be happening 
that way, though.  It looks like the 1 replica nodes are carrying about half 
the data the 2 replica nodes are.  It's almost as if those nodes are only 
handling 10% of the keyspace instead of 20%.

Does anybody have any suggestions as to what might be going on?  I've run 
nodetool getendpoints against a bunch of keys, and I always get back three 
nodes, so I'm pretty confused.  I've also run repair on a few nodes in both 
data centers, but the sizes are still vastly different.

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com


Re: consistency level question

2012-03-18 Thread Caleb Rackliffe
That sounds right to me :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:8E620335-844B-4EFF-ACAB-3D4439A3B4B6]

From: Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 18 Mar 2012 04:20:58 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: consistency level question

Thanks!
I updated replication factor to 2, and now when I took one node down all 
continued running (I did see Hector complaining on the node being down), but 
things were saved to db and read from it.

Just so I understand, now, having replication factor of 2, if I have 2 out of 3 
nodes running all my read and writes with CL=1 should work, right?


Tamar Fraenkel
Senior Software Engineer, TOK Media

[cid:ii_135b91fb888fa9ff]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Sun, Mar 18, 2012 at 9:57 AM, Watanabe Maki 
watanabe.m...@gmail.commailto:watanabe.m...@gmail.com wrote:
Because your RF is 1, so you need all nodes up.

maki


On 2012/03/18, at 16:15, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:

Hi!
I have a 3 node cassandra cluster.
I use Hector API.

I give hecotr one of the node's IP address
I call setAutoDiscoverHosts(true) and setRunAutoDiscoveryAtStartup(true).

The describe on one node returns:

Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:1]

The odd thing is that when I take one of the nodes down, expecting all to 
continue running smoothly, I get exceptions of the format seen bellow, and no 
read or write succeeds. When I bring the node back up, exceptions stop and read 
and write resumes.

Any idea or explanation why this is the case?
Thanks!


me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.
at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:66)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268)
at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
at 
me.prettyprint.cassandra.service.ColumnSliceIterator.hasNext(ColumnSliceIterator.java:60)
at


Tamar Fraenkel
Senior Software Engineer, TOK Media

tokLogo.png


ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




inline: tokLogo.pnginline: EB2FF764-478C-4966-9B0A-E7B76D6AD7DC[18].png

Re: Lots and Lots of CompactionReducer Threads

2012-01-08 Thread Caleb Rackliffe
With the exception of a few little warnings on start-up about the Memtable live 
ratio, there is nothing at WARN or above in the logs.  Just before the JVM 
terminates, there are about 10,000 threads in Reducer executor pools that look 
like this in JConsole …


Name: CompactionReducer:1
State: TIMED_WAITING on 
java.util.concurrent.SynchronousQueue$TransferStack@72938aea
Total blocked: 0  Total waited: 1

Stack trace:
 sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)


The results from tpstats don't look too interesting…

Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 03455159 0  
   0
RequestResponseStage  0 0   10133276 0  
   0
MutationStage 0 05898833 0  
   0
ReadRepairStage   0 02078449 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 0 236388 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
MemtablePostFlusher   0 0231 0  
   0
StreamStage   0 0  0 0  
   0
FlushWriter   0 0231 0  
   0
MiscStage 0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
HintedHandoff 0 0 35 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 0
REQUEST_RESPONSE 0

The results from info seem unremarkable as well…

Token: 15312706500
Gossip active: true
Load : 5.6 GB
Generation No: 1325995515
Uptime (seconds) : 67199
Heap Memory (MB) : 970.32 / 1968.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0

I'm using LeveledCompactionStrategy with no throttling, and I'm not changing 
the default on the number of concurrent compactors.

What is interesting to me here is that Cassandra creates an executor for every 
single compaction in ParallelCompactionIterable.  Why couldn't we just create a 
pool with Runtime.availableProcessors() Threads and be done with it?

Let me know if I left any info out.

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 8 Jan 2012 16:51:50 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Lots and Lots of CompactionReducer Threads

How many threads ? Any errors in the server logs ?

What does noodtool tpstats and nodetool compactionstats say ?

Did you change compaction_strategy for the CF's ?

By default cassandra will use as many compaction threads as you have cores, see 
concurrent_compactors in cassandra.yaml

Have you set the JVM heap settings ? What does nodetool info show ?

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/01/2012, at 3:51 PM, Caleb Rackliffe wrote:

Hi Everybody,

JConsole tells me I've got CompactionReducer threads stacking up, consuming 
memory, and never going away.  Eventually, my Java process fails because it 
can't allocate any more native threads.  Here's my setup…

Cassandra 1.0.5 on CentOS 6.0
4 GB of RAM
50 GB SSD HD
Memtable flush threshold = 128 MB
compaction throughput limit = 16 MB/sec
Multithreaded compaction = true

It may very well be that I'm doing something strange here, but it seems like 
those compaction threads should go away eventually.  I'm hoping the combination 
of a low Memtable

Re: Lots and Lots of CompactionReducer Threads

2012-01-08 Thread Caleb Rackliffe
After some searching, I think I may have found something in the code itself, 
and so I've filed a big report - 
https://issues.apache.org/jira/browse/CASSANDRA-3711

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 8 Jan 2012 17:48:59 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: aa...@thelastpickle.commailto:aa...@thelastpickle.com 
aa...@thelastpickle.commailto:aa...@thelastpickle.com
Subject: Re: Lots and Lots of CompactionReducer Threads

With the exception of a few little warnings on start-up about the Memtable live 
ratio, there is nothing at WARN or above in the logs.  Just before the JVM 
terminates, there are about 10,000 threads in Reducer executor pools that look 
like this in JConsole …


Name: CompactionReducer:1
State: TIMED_WAITING on 
java.util.concurrent.SynchronousQueue$TransferStack@72938aea
Total blocked: 0  Total waited: 1

Stack trace:
 sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)


The results from tpstats don't look too interesting…

Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 03455159 0  
   0
RequestResponseStage  0 0   10133276 0  
   0
MutationStage 0 05898833 0  
   0
ReadRepairStage   0 02078449 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 0 236388 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
MemtablePostFlusher   0 0231 0  
   0
StreamStage   0 0  0 0  
   0
FlushWriter   0 0231 0  
   0
MiscStage 0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
HintedHandoff 0 0 35 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 0
REQUEST_RESPONSE 0

The results from info seem unremarkable as well…

Token: 15312706500
Gossip active: true
Load : 5.6 GB
Generation No: 1325995515
Uptime (seconds) : 67199
Heap Memory (MB) : 970.32 / 1968.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0

I'm using LeveledCompactionStrategy with no throttling, and I'm not changing 
the default on the number of concurrent compactors.

What is interesting to me here is that Cassandra creates an executor for every 
single compaction in ParallelCompactionIterable.  Why couldn't we just create a 
pool with Runtime.availableProcessors() Threads and be done with it?

Let me know if I left any info out.

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com


From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 8 Jan 2012 16:51:50 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Lots and Lots of CompactionReducer Threads

How many threads ? Any errors in the server logs ?

What does noodtool tpstats and nodetool compactionstats say ?

Did you change compaction_strategy for the CF's ?

By default cassandra will use as many compaction threads as you have cores, see 
concurrent_compactors in cassandra.yaml

Have

Lots and Lots of CompactionReducer Threads

2012-01-07 Thread Caleb Rackliffe
Hi Everybody,

JConsole tells me I've got CompactionReducer threads stacking up, consuming 
memory, and never going away.  Eventually, my Java process fails because it 
can't allocate any more native threads.  Here's my setup…

Cassandra 1.0.5 on CentOS 6.0
4 GB of RAM
50 GB SSD HD
Memtable flush threshold = 128 MB
compaction throughput limit = 16 MB/sec
Multithreaded compaction = true

It may very well be that I'm doing something strange here, but it seems like 
those compaction threads should go away eventually.  I'm hoping the combination 
of a low Memtable flush threshold, low compaction T/P limit, and heavy write 
load doesn't mean those threads are hanging around because they're actually not 
done doing their compaction tasks.

Thanks,

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com



OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:57)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:134)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Has anybody seen this sort of problem before?

Thanks to anyone who takes a look.  I can provide more information than this, 
but I figure that's enough to start…

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


Re: OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
One other item…

java –version

java version 1.7.0_01
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(ParallelCompactionIterable.java:190)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:164)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135

Re: OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
I saw this article - http://comments.gmane.org/gmane.comp.db.cassandra.user/2225

I'm using the Hector client (for connection pooling), with ~3200 threads active 
according to JConsole.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Date: Fri, 6 Jan 2012 15:40:26 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

One other item…

java –version

java version 1.7.0_01
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com


From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(ParallelCompactionIterable.java:190)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced

Re: OutOfMemory Errors with Cassandra 1.0.5 (fixed)

2012-01-06 Thread Caleb Rackliffe
Okay, it looks like I was slightly underestimating the number of connections 
open on the cluster.  This probably won't be a problem after I tighten up the 
Hector pool maximums.

Sorry for the spam…

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Fri, 6 Jan 2012 20:13:37 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

I saw this article - http://comments.gmane.org/gmane.comp.db.cassandra.user/2225

I'm using the Hector client (for connection pooling), with ~3200 threads active 
according to JConsole.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com

From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Date: Fri, 6 Jan 2012 15:40:26 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

One other item…

java –version

java version 1.7.0_01
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.commailto:ca...@steelhouse.com


From: Caleb Rackliffe ca...@steelhouse.commailto:ca...@steelhouse.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread

Memtable live ratio of infinity

2011-12-15 Thread Caleb Rackliffe
Hi All,

I saw the following log message today on a node running cassandra 1.0.5:

WARN [pool-1-thread-1] 2011-12-15 20:28:53,915 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of Infinity

I guess this means calculated throughput is either very low or the Memtable is 
huge.  Either way, the following line in Memtable looks a bit weird:

double newRatio = (double) deepSize / currentThroughput.get();

Has anyone seen this warning before?

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:835561CC-8141-473C-B2D5-344B8D34B499]
inline: EB2FF764-478C-4966-9B0A-E7B76D6AD7DC[2].png

Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

2011-12-11 Thread Caleb Rackliffe
Hi All,

I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine.  I installed JNA 
through yum and made a symbolic link to jna.jar in my Cassandra lib directory.  
When I run bin/cassandra -f, I get the following:

 INFO 09:14:31,552 Logging initialized
 INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server 
VM/1.6.0_29
 INFO 09:14:31,555 Heap size: 3405774848/3405774848
 INFO 09:14:31,555 Classpath: 
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar
Killed

If I remove the symlink to JNA, it starts up just fine.

Also, I do have entries in my limits.conf for JNA:

rootsoftmemlock unlimited
roothardmemlock unlimited

Has anyone else seen this behavior?

Thanks,

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:51C98683-7807-40ED-992A-1FBF168FE2BD]
inline: EB2FF764-478C-4966-9B0A-E7B76D6AD7DC.png

Causes of a High Memtable Live Ratio

2011-11-19 Thread Caleb Rackliffe
Hi All,

From what I've read in the source, a Memtable's live ratio is the ratio of 
Memtable usage to the current write throughput.  If this is too high, I 
imagine the system could be in a possibly unsafe state, as the comment in 
Memtable.java indicates.

Today, while bulk loading some data, I got the following message:

WARN [pool-1-thread-1] 2011-11-18 21:08:57,331 Memtable.java (line 172) setting 
live ratio to maximum of 64 instead of 78.87903667214012

Should I be worried?  If so, does anybody have any suggestions for how to 
address it?

Thanks :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:88029BB7-C464-45DA-94B5-B7188AED39A7]
inline: EB2FF764-478C-4966-9B0A-E7B76D6AD7DC[1].png

Re: super sub slice query?

2011-10-27 Thread Caleb Rackliffe
I had the same question you did, I think.  Below is as far as I got with Hector…

I have a column family of super-columns with long names.  The columns in each 
super-column also have long names.  I'm using Hector, and what I want to do is 
get the last column in each super-column, for a range of super-columns.  I was 
able to get the last column in a column family  like this…


Cluster cluster = HFactory.getOrCreateCluster(Cortex, config);

Keyspace keyspace = HFactory.createKeyspace(Products, cluster);

RangeSlicesQueryString, String, String rangeSlicesQuery =

HFactory.createRangeSlicesQuery(keyspace, StringSerializer.get(), 
StringSerializer.get(), StringSerializer.get());

rangeSlicesQuery.setColumnFamily(Attributes);

rangeSlicesQuery.setKeys(id0, id0);

rangeSlicesQuery.setRange(, , true, 1);


QueryResultOrderedRowsString, String, String rsult = 
rangeSlicesQuery.execute();


…but no luck with the additional dimension.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Guy Incognito dnd1...@gmail.commailto:dnd1...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thu, 27 Oct 2011 06:34:08 -0400
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: super sub slice query?

is there such a thing?  a query that runs against a SC family and returns a 
subset of subcolumns from a set of super-columns?

is there a way to have eg a slice query (or super slice query) only return the 
column names, rather than the value as well?


Reading Last Values From a SuperColumn

2011-10-26 Thread Caleb Rackliffe
Hi Everybody,

I have a column family of super-columns with long names.  The columns in each 
super-column also have long names.  I'm using Hector, and what I want to do is 
get the last column in each super-column, for a range of super-columns.  I was 
able to get the last column in a column family  like this…


Cluster cluster = HFactory.getOrCreateCluster(Cortex, config);

Keyspace keyspace = HFactory.createKeyspace(Products, cluster);

RangeSlicesQueryString, String, String rangeSlicesQuery =

HFactory.createRangeSlicesQuery(keyspace, StringSerializer.get(), 
StringSerializer.get(), StringSerializer.get());

rangeSlicesQuery.setColumnFamily(Attributes);

rangeSlicesQuery.setKeys(id0, id0);

rangeSlicesQuery.setRange(, , true, 1);


QueryResultOrderedRowsString, String, String rsult = 
rangeSlicesQuery.execute();


…but no luck with the additional dimension.


Thanks in advance!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:63B04035-B5DF-48C6-9396-B72DBE859205]
inline: 026985CE-DD5D-4E4A-AC8E-A63219DF0CE1[11].png