CQL and counters

2013-11-22 Thread Bryce Godfrey
I'm looking for some guidance on how to model some stat tracking over time, 
bucketed to some type of interval (15 min, hour, etc).

As an example, let's say I would like to track network traffic throughput and 
bucket it to 15 minute intervals.  In our old model, using thrift I would 
create a column family set to counter, and use a timestamp ticks for the column 
name for a total and count column.  And as data was sampled, we would 
increment count by one, and increment the total with the sampled value for that 
time bucket.  The column name would give us the datetime for the values, as 
well as provide me with a convenient row slice query to get a date range for 
any given statistic.

Key| 1215  | 1230 | 1245
NIC1:Total   | 100| 56  |  872
NIC1:Count | 15  | 15  | 15

Then given the total/count I can show an average over time.

In CQL it seems like I can't do new counter columns at runtime unless they are 
defined in the schema first or run an ALTER statement, which may not be the 
correct way to go.  So is there a better way to model this type of data with 
the new CQL world?  Nor do I know how to query that type of data, similar to 
the row slice by column name.

Thanks,
Bryce


RE: High bandwidth usage between datacenters for cluster

2012-10-27 Thread Bryce Godfrey
Network topology with the topology file filled out is already the configuration 
we are using.

From: sankalp kohli [mailto:kohlisank...@gmail.com]
Sent: Thursday, October 25, 2012 11:55 AM
To: user@cassandra.apache.org
Subject: Re: High bandwidth usage between datacenters for cluster

Use placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and also fill the topology.properties file. This will tell cassandra that you 
have two DCs. You can verify that by looking at output of the ring command.

If you DCs are setup properly, only one request will go over WAN. Though the 
responses from all nodes in other DC will go over WAN.

On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We have a 5 node cluster, with a matching 5 nodes for DR in another data 
center.   With a replication factor of 3, does the node I send a write too 
attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 
and let it replicate locally in the DR environment to save bandwidth across the 
WAN?
Normally this isn't an issue for us, but at times we are writing approximately 
1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to 
all the Cassandra DR servers.

If my assumptions are right, is this configurable somehow for writing to one 
node and letting it do local replication?  We are on 1.1.5

Thanks



High bandwidth usage between datacenters for cluster

2012-10-25 Thread Bryce Godfrey
We have a 5 node cluster, with a matching 5 nodes for DR in another data 
center.   With a replication factor of 3, does the node I send a write too 
attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 
and let it replicate locally in the DR environment to save bandwidth across the 
WAN?
Normally this isn't an issue for us, but at times we are writing approximately 
1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to 
all the Cassandra DR servers.

If my assumptions are right, is this configurable somehow for writing to one 
node and letting it do local replication?  We are on 1.1.5

Thanks


Prevent queries from OOM nodes

2012-09-24 Thread Bryce Godfrey
Is there anything I can do on the configuration side to prevent nodes from 
going OOM due to queries that will read large amounts of data and exceed the 
heap available?

For the past few days of we had some nodes consistently freezing/crashing with 
OOM.  We got a heap dump into MAT and figured out the nodes were dying due to 
some queries for a few extremely large data sets.  Tracked it back to an app 
that just didn't prevent users from doing these large queries, but it seems 
like Cassandra could be smart enough to guard against this type of thing?

Basically some kind of setting like if the data to satisfy query  available 
heap then throw an error to the caller and about query.  I would much rather 
return errors to clients then crash a node, as the error is easier to track 
down that way and resolve.

Thanks.


RE: Expanding cluster to include a new DR datacenter

2012-08-29 Thread Bryce Godfrey
Well I tried to drop the keyspace, but it's still there.  No errors in logs and 
Cassandra-cli showed the schema agreement after the command.  I took a snapshot 
of the system keyspace first.  Nothing is crashing in the clients yet either, 
still able to read/write to that keyspace.

[default@EBonding] drop keyspace EBonding;
2eb11095-b8a8-31cd-80c3-c748d32a4208
Waiting for schema agreement...
... schemas agree across the cluster

[default@unknown] use EBonding;
Authenticated to keyspace: EBonding
[default@EBonding] describe;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Wednesday, August 29, 2012 2:36 AM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

It would be handy to work out what the corruption is. Could you snapshot the 
system keyspace and store it somewhere, just incase we can look at it later ?

Is there a way I can confirm this
Errors in the client and/or the server log is the the traditional way.

go about cleaning up/restoring the proper schema?
If you need to get it back, and can handle the down time, the simple thing is 
drop the KS and re-create it. Remember to take a snapshot first. Drop keyspace 
takes one but it's the sort of thing I would do myself.

Or you can try _try_ nodetool resetlocalschema.  Without knowing what the error 
is it's hard to say if it would work.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/08/2012, at 9:10 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:


I believe what may be really going on is that my schema is in a bad or corrupt 
state.  I also have one keyspace that I just cannot drop an existing column 
family from even though it shows no errors.

So right now I was able to get 4 of my 6 keyspaces over to Network Topology 
strategy.

I think I got into this bad state after pointing Opscenter at this cluster for 
the first time, as it started throwing errors after that and crashed a couple 
of my nodes until I stopped it and its agents.

Is there a way I can confirm this or go about cleaning up/restoring the proper 
schema?

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.comhttp://azaleos.com]
Sent: Tuesday, August 28, 2012 11:09 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

So in an interesting turn of events, this works on my other 4 keyspaces but 
just not this 'EBonding' one which will not recognize the changes.  I can 
probably get around this by dropping and re-creating this keyspace since its 
uptime is not too important for us.

[default@AlertStats] describe AlertStats;
Keyspace: AlertStats:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [Fisher:3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 3:50 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

Can you describe your schema again with TierPoint in it?
On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia 
[mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should

RE: Expanding cluster to include a new DR datacenter

2012-08-28 Thread Bryce Godfrey
So in an interesting turn of events, this works on my other 4 keyspaces but 
just not this 'EBonding' one which will not recognize the changes.  I can 
probably get around this by dropping and re-creating this keyspace since its 
uptime is not too important for us.

[default@AlertStats] describe AlertStats;
Keyspace: AlertStats:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [Fisher:3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 3:50 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

Can you describe your schema again with TierPoint in it?
On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia 
[mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include

RE: Expanding cluster to include a new DR datacenter

2012-08-28 Thread Bryce Godfrey
I believe what may be really going on is that my schema is in a bad or corrupt 
state.  I also have one keyspace that I just cannot drop an existing column 
family from even though it shows no errors.

So right now I was able to get 4 of my 6 keyspaces over to Network Topology 
strategy.

I think I got into this bad state after pointing Opscenter at this cluster for 
the first time, as it started throwing errors after that and crashed a couple 
of my nodes until I stopped it and its agents.

Is there a way I can confirm this or go about cleaning up/restoring the proper 
schema?

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Tuesday, August 28, 2012 11:09 AM
To: user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

So in an interesting turn of events, this works on my other 4 keyspaces but 
just not this 'EBonding' one which will not recognize the changes.  I can 
probably get around this by dropping and re-creating this keyspace since its 
uptime is not too important for us.

[default@AlertStats] describe AlertStats;
Keyspace: AlertStats:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [Fisher:3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 3:50 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

Can you describe your schema again with TierPoint in it?
On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia 
[mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy

RE: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Bryce Godfrey
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:


Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.comhttp://gmail.com]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it 
useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over

RE: Expanding cluster to include a new DR datacenter

2012-08-27 Thread Bryce Godfrey
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it 
useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan

RE: Expanding cluster to include a new DR datacenter

2012-08-25 Thread Bryce Godfrey
Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it useful: 
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
nodes.  For the strategy options, just list the first dc until the second is up 
(e.g. {main_dc: 3}).


Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is the first thing you'll want to do.  Unless your node IPs would 
naturally put all nodes in a DC in the same rack, I recommend using 
PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer PFSnitch 
regardless; it's harder to accidentally mess up.)  A rolling restart is 
required to pick up the change.  Make sure to fill out 
cassandra-topology.properties first if using PFSnitch.


This is all on Cassandra 1.1.4, Thanks for any help!





--
Tyler Hobbs
DataStaxhttp://datastax.com/



RE: Expanding cluster to include a new DR datacenter

2012-08-24 Thread Bryce Godfrey
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it useful: 
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
nodes.  For the strategy options, just list the first dc until the second is up 
(e.g. {main_dc: 3}).


Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is the first thing you'll want to do.  Unless your node IPs would 
naturally put all nodes in a DC in the same rack, I recommend using 
PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer PFSnitch 
regardless; it's harder to accidentally mess up.)  A rolling restart is 
required to pick up the change.  Make sure to fill out 
cassandra-topology.properties first if using PFSnitch.


This is all on Cassandra 1.1.4, Thanks for any help!





--
Tyler Hobbs
DataStaxhttp://datastax.com/


RE: Expanding cluster to include a new DR datacenter

2012-08-23 Thread Bryce Godfrey
Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it useful: 
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
nodes.  For the strategy options, just list the first dc until the second is up 
(e.g. {main_dc: 3}).


Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is the first thing you'll want to do.  Unless your node IPs would 
naturally put all nodes in a DC in the same rack, I recommend using 
PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer PFSnitch 
regardless; it's harder to accidentally mess up.)  A rolling restart is 
required to pick up the change.  Make sure to fill out 
cassandra-topology.properties first if using PFSnitch.


This is all on Cassandra 1.1.4, Thanks for any help!





--
Tyler Hobbs
DataStaxhttp://datastax.com/


Expanding cluster to include a new DR datacenter

2012-08-22 Thread Bryce Godfrey
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is all on Cassandra 1.1.4, Thanks for any help!




Joining DR nodes in new data center

2012-08-02 Thread Bryce Godfrey
What is the process for joining a new data center to an existing cluster as DR?

We have a 5 node cluster in our primary DC, and want to bring up 5 more in our 
2nd data center purely for DR.  How should these new nodes be joined to the 
cluster and be seen as the 2nd data center?  Do the new nodes mirror the 
configuration of the existing nodes but with some setting to indicate they are 
in another DC?

Our existing cluster is using the defaults mostly of network placement strategy 
and simple snitch.

Thanks.


2 nodes throwing exceptions trying to compact after upgrade to 1.1.2 from 1.1.0

2012-07-16 Thread Bryce Godfrey
This may not be directly related to the upgrade to 1.1.2, but I was running on 
1.1.0 for a while with no issues, and I did the upgrade to 1.1.2 a few days ago.

2 of my nodes started throwing lots of promote exceptions, and then a lot of 
the beforeAppend exceptions from then on every few minutes.  This is on the 
high update CF that's using leveled compaction and compression.  The other 3 
nodes are not experiencing this.  I can send entire log files if desired.
These 2 nodes now have much higher load #'s then the other 3, and I'm assuming 
that's because they are failing with the compaction errors?

$
INFO [CompactionExecutor:1783] 2012-07-13 07:35:23,268 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-392322-Data$
ERROR [CompactionExecutor:1783] 2012-07-13 07:35:29,696 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:1783,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

INFO [CompactionExecutor:3310] 2012-07-16 11:14:02,481 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-369173-Data$
ERROR [CompactionExecutor:3310] 2012-07-16 11:14:04,031 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:3310,1,main]
java.lang.RuntimeException: Last written key 
DecoratedKey(150919285004100953907590722809541628889, 
5b30363334353237652d383966382d653031312d623131632d3030313535643031373530325d5b436f6d70757465725b4d5350422d$
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


RE: 2 nodes throwing exceptions trying to compact after upgrade to 1.1.2 from 1.1.0

2012-07-16 Thread Bryce Godfrey
Thanks, is there a way around this for now or should I fall back to 1.1.0?


From: Rudolf van der Leeden [mailto:rudolf.vanderlee...@scoreloop.com]
Sent: Monday, July 16, 2012 12:55 PM
To: user@cassandra.apache.org
Cc: Rudolf van der Leeden
Subject: Re: 2 nodes throwing exceptions trying to compact after upgrade to 
1.1.2 from 1.1.0

See  
https://issues.apache.org/jira/browse/CASSANDRA-4411https://issues.apache.org/jira/browse/CASSANDRA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
The bug is related to LCS (leveled compaction) and has been fixed.


On 16.07.2012, at 20:32, Bryce Godfrey wrote:


This may not be directly related to the upgrade to 1.1.2, but I was running on 
1.1.0 for a while with no issues, and I did the upgrade to 1.1.2 a few days ago.

2 of my nodes started throwing lots of promote exceptions, and then a lot of 
the beforeAppend exceptions from then on every few minutes.  This is on the 
high update CF that's using leveled compaction and compression.  The other 3 
nodes are not experiencing this.  I can send entire log files if desired.
These 2 nodes now have much higher load #'s then the other 3, and I'm assuming 
that's because they are failing with the compaction errors?

$
INFO [CompactionExecutor:1783] 2012-07-13 07:35:23,268 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-392322-Data$
ERROR [CompactionExecutor:1783] 2012-07-13 07:35:29,696 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:1783,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

INFO [CompactionExecutor:3310] 2012-07-16 11:14:02,481 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-369173-Data$
ERROR [CompactionExecutor:3310] 2012-07-16 11:14:04,031 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:3310,1,main]
java.lang.RuntimeException: Last written key 
DecoratedKey(150919285004100953907590722809541628889, 
5b30363334353237652d383966382d653031312d623131632d3030313535643031373530325d5b436f6d70757465725b4d5350422d$
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)



RE: Problem joining new node to cluster in 1.1.1

2012-06-08 Thread Bryce Godfrey
https://issues.apache.org/jira/browse/CASSANDRA-4323

Not sure if it's a dupe of what Brandon sent (4251), so created the bug anyway.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Friday, June 08, 2012 9:08 AM
To: user@cassandra.apache.org
Subject: Re: Problem joining new node to cluster in 1.1.1

That very much look like a bug. Would you mind opening a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA with those stack traces and 
maybe a little bit more precision on what you were doing when that happened?

--
Sylvain

On Fri, Jun 8, 2012 at 12:28 AM, Bryce Godfrey bryce.godf...@azaleos.com 
wrote:
 As the new node starts up I get this error before boostrap starts:



 INFO 08:20:51,584 Enqueuing flush of 
 Memtable-schema_columns@1493418651(0/0
 serialized/live bytes, 1 ops)

 INFO 08:20:51,584 Writing Memtable-schema_columns@1493418651(0/0
 serialized/live bytes, 1 ops)

 INFO 08:20:51,589 Completed flushing
 /opt/cassandra/data/system/schema_columns/system-schema_columns-hc-1-D
 ata.db
 (61 bytes)

 ERROR 08:20:51,889 Exception in thread Thread[MigrationStage:1,5,main]

 java.lang.IllegalArgumentException: value already present: 1015

     at
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:
 115)

     at
 com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.ja
 va:111)

     at
 com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96)

     at com.google.common.collect.HashBiMap.put(HashBiMap.java:84)

     at org.apache.cassandra.config.Schema.load(Schema.java:385)

     at
 org.apache.cassandra.db.DefsTable.addColumnFamily(DefsTable.java:426)

     at
 org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:3
 61)

     at 
 org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270)

     at
 org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:248
 )

     at
 org.apache.cassandra.service.MigrationManager$MigrationTask.runMayThro
 w(MigrationManager.java:416)

     at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30
 )

     at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)

     at java.util.concurrent.FutureTask$Sync.innerRun(Unknown 
 Source)

     at java.util.concurrent.FutureTask.run(Unknown Source)

     at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)

     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

     at java.lang.Thread.run(Unknown Source)

 INFO 08:20:51,931 Enqueuing flush of
 Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 
 ops)

 INFO 08:20:51,932 Writing Memtable-schema_keyspaces@833041663(943/1178
 serialized/live bytes, 20 ops)





 Then it starts spewing these errors nonstop until I kill it.



 ERROR 08:21:45,959 Error in row mutation

 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
 cfId=1019

     at
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil
 ySerializer.java:126)

     at
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(
 RowMutation.java:439)

     at
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(
 RowMutation.java:447)

     at
 org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)

     at
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbH
 andler.java:42)

     at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
 ava:59)

     at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)

     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

     at java.lang.Thread.run(Unknown Source)

 ERROR 08:21:45,814 Error in row mutation

 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
 cfId=1019

     at
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil
 ySerializer.java:126)

     at
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(
 RowMutation.java:439)

     at
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(
 RowMutation.java:447)

     at
 org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)

     at
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbH
 andler.java:42)

     at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
 ava:59)

     at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)

     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

     at java.lang.Thread.run(Unknown Source)

 ERROR 08:21:45,813 Error in row mutation

 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
 cfId=1020

     at
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil

Problem joining new node to cluster in 1.1.1

2012-06-07 Thread Bryce Godfrey
As the new node starts up I get this error before boostrap starts:

INFO 08:20:51,584 Enqueuing flush of Memtable-schema_columns@1493418651(0/0 
serialized/live bytes, 1 ops)
INFO 08:20:51,584 Writing Memtable-schema_columns@1493418651(0/0 
serialized/live bytes, 1 ops)
INFO 08:20:51,589 Completed flushing 
/opt/cassandra/data/system/schema_columns/system-schema_columns-hc-1-Data.db 
(61 bytes)
ERROR 08:20:51,889 Exception in thread Thread[MigrationStage:1,5,main]
java.lang.IllegalArgumentException: value already present: 1015
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at 
com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111)
at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96)
at com.google.common.collect.HashBiMap.put(HashBiMap.java:84)
at org.apache.cassandra.config.Schema.load(Schema.java:385)
at org.apache.cassandra.db.DefsTable.addColumnFamily(DefsTable.java:426)
at 
org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:361)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270)
at 
org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:248)
at 
org.apache.cassandra.service.MigrationManager$MigrationTask.runMayThrow(MigrationManager.java:416)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO 08:20:51,931 Enqueuing flush of 
Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 ops)
INFO 08:20:51,932 Writing Memtable-schema_keyspaces@833041663(943/1178 
serialized/live bytes, 20 ops)


Then it starts spewing these errors nonstop until I kill it.

ERROR 08:21:45,959 Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR 08:21:45,814 Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR 08:21:45,813 Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR 08:21:45,813 Error in row mutation


I'm guessing the first error caused some column families to not be created?  

RE: 1.1 not removing commit log files?

2012-06-04 Thread Bryce Godfrey
I'll try to get some log files for this with DEBUG enabled.  Tough on 
production though.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 11:15 AM
To: user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

Apply the local hint mutation follows the same code path and regular mutations.

When the commit log is being truncated you should see flush activity, logged 
from the ColumnFamilyStore with Enqueuing flush of  messages.

If you set DEBUG logging for the  org.apache.cassandra.db.ColumnFamilyStore it 
will log if it things the CF is clean and no flush takes place.

If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will 
see if the commit log file could not be deleted because a dirty CF was not 
flushed.

Cheers
A


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 4:43 AM, Rob Coli wrote:


On Thu, May 31, 2012 at 7:01 PM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

But that talks about segments not being cleared at startup. Does not explain
why they were allowed to get past the limit in the first place.

Perhaps the commit log size tracking for this limit does not, for some
reason, track hints? This seems like the obvious answer given the
state which appears to trigger it? This doesn't explain why the files
aren't getting deleted after the hints are delivered, of course...

=Rob

--
=Robert Coli
AIMGTALK - rc...@palominodb.commailto:rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb



RE: 1.1 not removing commit log files?

2012-05-31 Thread Bryce Godfrey
So this happened to me again, but it was only when the cluster had a node down 
for a while.  Then the commit logs started piling up past the limit I set in 
the config file, and filled the drive.
After the node recovered and hints had replayed the space was never reclaimed.  
A flush or drain did not reclaim the space either and delete any log files.

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Tuesday, May 22, 2012 1:10 PM
To: user@cassandra.apache.org
Subject: RE: 1.1 not removing commit log files?

The nodes appear to be holding steady at the 8G that I set it to in the config 
file now.  I'll keep an eye on them.

From: aaron morton 
[mailto:aa...@thelastpickle.com]mailto:[mailto:aa...@thelastpickle.com]
Sent: Tuesday, May 22, 2012 4:08 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

4096 is also the internal hard coded default for commitlog_total_space_in_mb

If you are seeing more that 4GB of commit log files let us know.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote:

Thanks, I'll give it a try.

-Original Message-
From: Alain RODRIGUEZ 
[mailto:arodr...@gmail.com]mailto:[mailto:arodr...@gmail.com]
Sent: Monday, May 21, 2012 2:12 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

commitlog_total_space_in_mb: 4096

By default this line is commented in 1.0.x if I remember well. I guess it is 
the same in 1.1. You really should remove this comment or your commit logs will 
entirely fill up your disk as it happened to me a while ago.

Alain

2012/5/21 Pieter Callewaert 
pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be:
Hi,



In 1.1 the commitlog files are pre-allocated with files of 128MB.
(https://issues.apache.org/jira/browse/CASSANDRA-3411) This should
however not exceed your commitlog size in Cassandra.yaml.



commitlog_total_space_in_mb: 4096



Kind regards,

Pieter Callewaert



From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com]
Sent: maandag 21 mei 2012 9:52
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: 1.1 not removing commit log files?



The commit log drives on my nodes keep slowly filling up.  I don't see
any errors in my logs that are indicating any issues that I can map to
this issue.



Is this how 1.1 is supposed to work now?  Previous versions seemed to
keep this drive at a minimum as it flushed.



/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog





RE: 1.1 not removing commit log files?

2012-05-22 Thread Bryce Godfrey
The nodes appear to be holding steady at the 8G that I set it to in the config 
file now.  I'll keep an eye on them.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, May 22, 2012 4:08 AM
To: user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

4096 is also the internal hard coded default for commitlog_total_space_in_mb

If you are seeing more that 4GB of commit log files let us know.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote:


Thanks, I'll give it a try.

-Original Message-
From: Alain RODRIGUEZ 
[mailto:arodr...@gmail.com]mailto:[mailto:arodr...@gmail.com]
Sent: Monday, May 21, 2012 2:12 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

commitlog_total_space_in_mb: 4096

By default this line is commented in 1.0.x if I remember well. I guess it is 
the same in 1.1. You really should remove this comment or your commit logs will 
entirely fill up your disk as it happened to me a while ago.

Alain

2012/5/21 Pieter Callewaert 
pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be:

Hi,



In 1.1 the commitlog files are pre-allocated with files of 128MB.
(https://issues.apache.org/jira/browse/CASSANDRA-3411) This should
however not exceed your commitlog size in Cassandra.yaml.



commitlog_total_space_in_mb: 4096



Kind regards,

Pieter Callewaert



From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com]
Sent: maandag 21 mei 2012 9:52
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: 1.1 not removing commit log files?



The commit log drives on my nodes keep slowly filling up.  I don't see
any errors in my logs that are indicating any issues that I can map to
this issue.



Is this how 1.1 is supposed to work now?  Previous versions seemed to
keep this drive at a minimum as it flushed.



/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog





1.1 not removing commit log files?

2012-05-21 Thread Bryce Godfrey
The commit log drives on my nodes keep slowly filling up.  I don't see any 
errors in my logs that are indicating any issues that I can map to this issue.

Is this how 1.1 is supposed to work now?  Previous versions seemed to keep this 
drive at a minimum as it flushed.

/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog



Node join streaming stuck at 100%

2012-04-26 Thread Bryce Godfrey
This is the second node I've joined to my cluster in the last few days, and so 
far both have become stuck at 100% on a large file according to netstats.  This 
is on 1.0.9, is there anything I can do to make it move on besides restarting 
Cassandra?  I don't see any errors or warns in logs for either server, and 
there is plenty of disk space.

On the sender side I see this:
Streaming to: /10.20.1.152
   /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db 
sections=1 progress=82393861085/82393861085 - 100%

On the node joining I don't see this file in netstats, and all pending streams 
are sitting at 0%





RE: size tiered compaction - improvement

2012-04-18 Thread Bryce Godfrey
Per CF or per Row TTL would be very usefull for me also with our timeseries 
data.

-Original Message-
From: Igor [mailto:i...@4friends.od.ua] 
Sent: Wednesday, April 18, 2012 6:06 AM
To: user@cassandra.apache.org
Subject: Re: size tiered compaction - improvement

For my use case it would be nice to have per CF TTL (to protect myself from 
application bug and from storage leak due to missed TTL), but seems you can't 
avoid tombstones even in this case and if you change CF TTL during runtime.

On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote:
 Our use case requires Column TTL, not CF TTL, because it is variable, not 
 constant.


 Best regards/ Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments 
 is intended solely for the attention and use of the named addressee 
 and may be confidential. If you are not the intended recipient, you 
 are reminded that the information remains the property of the sender. 
 You must not use, disclose, distribute, copy, print or rely on this 
 e-mail. If you have received this message in error, please contact the 
 sender immediately and irrevocably delete this message and any 
 copies.-Original Message-
 From: Radim Kolar [mailto:h...@filez.com]
 Sent: Wednesday, April 18, 2012 12:57
 To: user@cassandra.apache.org
 Subject: Re: size tiered compaction - improvement


 Any compaction pass over A will first convert the TTL data into tombstones.

 Then, any subsequent pass that includes A *and all other sstables 
 containing rows with the same key* will drop the tombstones.
 thats why i proposed to attach TTL to entire CF. Tombstones would not 
 be needed



RE: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

2012-04-17 Thread Bryce Godfrey
Sorry, I found the issue.  The server I was using had 32bit java installed.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Monday, April 16, 2012 11:39 PM
To: user@cassandra.apache.org
Subject: Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

On Mon, Apr 16, 2012 at 10:45 PM, Bryce Godfrey bryce.godf...@azaleos.com 
wrote:
 I keep running into this with my testing (on a windows box), Is this just a 
 OOM for RAM?

How much RAM do you have? Do you use completely standard settings? Do you also 
OOM if you try the same test with Cassandra 1.0.9?

--
Sylvain


 ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Map failed
        at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg
 ment.java:127)
        at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(Commit
 LogSegment.java:80)
        at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegmen
 t(CommitLogAllocator.java:244)
        at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(Commit
 LogAllocator.java:49)
        at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(Com
 mitLogAllocator.java:104)
        at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30
 )
        at java.lang.Thread.run(Unknown Source) Caused by: 
 java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(Unknown Source)
        at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg
 ment.java:119)
        ... 6 more
 Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        ... 8 more
  INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 
 CassandraDaemon.java (line 218) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 
 MessagingService.java (line 539) Waiting for messaging service to 
 quiesce
  INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java 
 (line 695) MessagingService shutting down server thread.

 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Friday, April 13, 2012 9:41 AM
 To: user@cassandra.apache.org
 Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

 The Cassandra team is pleased to announce the release of the first release 
 candidate for the future Apache Cassandra 1.1.

 Please first note that this is a release candidate, *not* the final release 
 yet.

 All help in testing this release candidate will be greatly appreciated. 
 Please report any problem you may encounter[3,4] and have a look at the 
 change log[1] and the release notes[2] to see where Cassandra 1.1 differs 
 from the previous series.

 Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra 
 website (http://cassandra.apache.org/download/) and a debian package is 
 available using the 11x branch (see 
 http://wiki.apache.org/cassandra/DebianPackaging).

 Thank you for your help in testing and have fun with it.

 [1]: http://goo.gl/XwH7J (CHANGES.txt)
 [2]: http://goo.gl/JocLX (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 [4]: user@cassandra.apache.org
 [5]: 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=re
 fs/tags/cassandra-1.1.0-rc1


RE: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

2012-04-16 Thread Bryce Godfrey
I keep running into this with my testing (on a windows box), Is this just a OOM 
for RAM?

ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.io.IOError: java.io.IOException: Map failed
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:80)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:244)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:49)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:104)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(Unknown Source)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:119)
... 6 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
... 8 more
 INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 CassandraDaemon.java 
(line 218) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 
MessagingService.java (line 539) Waiting for messaging service to quiesce
 INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java (line 
695) MessagingService shutting down server thread.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Friday, April 13, 2012 9:41 AM
To: user@cassandra.apache.org
Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

The Cassandra team is pleased to announce the release of the first release 
candidate for the future Apache Cassandra 1.1.

Please first note that this is a release candidate, *not* the final release yet.

All help in testing this release candidate will be greatly appreciated. Please 
report any problem you may encounter[3,4] and have a look at the change log[1] 
and the release notes[2] to see where Cassandra 1.1 differs from the previous 
series.

Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra website 
(http://cassandra.apache.org/download/) and a debian package is available using 
the 11x branch (see http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/XwH7J (CHANGES.txt)
[2]: http://goo.gl/JocLX (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]: 
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-rc1


RE: Large hints column family

2012-03-16 Thread Bryce Godfrey
I took the reset the world approach, things are much better now and the hints 
table is staying empty.  Bit disconcerting that it could get so large and not 
be able to recover itself, but at least there was a solution.  Thanks


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, March 15, 2012 7:24 PM
To: user@cassandra.apache.org
Subject: Re: Large hints column family

These messages make it look like the node is having trouble delivering hints.
INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 
284) Endpoint /192.168.20.4 died before hint delivery, aborting
INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 
354) Timed out replaying hints to /192.168.20.3; aborting further deliveries

Take another look at the logs on this machine and on 20.4 and 20.3.

I would be looking int why so many hints are been stored. GC ? are there also 
logs about dropped messages ?

If you want to reset the world, make sure the nodes have all run repair and 
then drop the hints. Either via JMX or stopped in the node and deleting the 
files on disk.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/03/2012, at 12:58 PM, Bryce Godfrey wrote:


We were having some occasional memory pressure issues, but we just added some 
more RAM a few days ago to the nodes and things are running more smoothly now, 
but in general nodes have not been going up and down.

I tried to do a list HintsColumnFamily from Cassandra-cli and it locks my 
Cassandra node and never returns, forcing me to kill the Cassandra process and 
restart it to get the node back.

Here is my settings which I believe are default since I don't remember changing 
them:

hinted_handoff_enabled: true
max_hint_window_in_ms: 360 # one hour
hinted_handoff_throttle_delay_in_ms: 50

Greping for Hinted in system log I get these
INFO [HintedHandoff:1] 2012-03-13 16:13:22,215 HintedHandOffManager.java (line 
373) Finished hinted handoff of 852703 rows to endpoint /192.168.20.3
INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 
284) Endpoint /192.168.20.4 died before hint delivery, aborting
INFO [ScheduledTasks:1] 2012-03-13 16:15:32,569 StatusLogger.java (line 65) 
HintedHandoff 1 1 0
INFO [HintedHandoff:1] 2012-03-13 16:15:44,362 HintedHandOffManager.java (line 
296) Started hinted handoff for token: 113427455640312814857969558651062452224 
with IP: /192.168.20.3
INFO [HintedHandoff:1] 2012-03-13 16:21:37,266 HintedHandOffManager.java (line 
296) Started hinted handoff for token: 113427455640312814857969558651062452224 
with IP: /192.168.20.3
INFO [ScheduledTasks:1] 2012-03-13 16:23:07,662 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-13 16:25:49,330 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-13 16:30:52,503 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-13 16:42:22,202 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 
354) Timed out replaying hints to /192.168.20.3; aborting further deliveries
INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 ColumnFamilyStore.java (line 
704) Enqueuing flush of Memtable-HintsColumnFamily@661547256(34298224/74465815 
serialized/live bytes, 78808 ops)
INFO [HintedHandoff:1] 2012-03-13 17:11:00,098 HintedHandOffManager.java (line 
373) Finished hinted handoff of 44160 rows to endpoint /192.168.20.3
INFO [HintedHandoff:1] 2012-03-13 17:11:36,596 HintedHandOffManager.java (line 
296) Started hinted handoff for token: 56713727820156407428984779325531226112 
with IP: /192.168.20.4
INFO [ScheduledTasks:1] 2012-03-13 17:12:25,248 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [HintedHandoff:1] 2012-03-13 18:47:56,151 HintedHandOffManager.java (line 
296) Started hinted handoff for token: 113427455640312814857969558651062452224 
with IP: /192.168.20.3
INFO [ScheduledTasks:1] 2012-03-13 18:50:24,326 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-14 12:12:48,177 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-14 12:13:57,685 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-14 12:14:57,258 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-14 12:14:58,260 StatusLogger.java (line 65) 
HintedHandoff 1 2 0
INFO [ScheduledTasks:1] 2012-03-14 12:15:59,093 StatusLogger.java (line 65) 
HintedHandoff

Large hints column family

2012-03-14 Thread Bryce Godfrey
The system HintsColumnFamily seems large in my cluster, and I want to track 
down why that is.  I try invoking listEndpointsPendingHints() for 
o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node 
that its invoked against.  It's a 3 node cluster, and all nodes have been up 
and running without issue for a while.  Any help on where to start with this?

   Column Family: HintsColumnFamily
SSTable count: 11
Space used (live): 11271669539
Space used (total): 11271669539
Number of Keys (estimate): 1408
Memtable Columns Count: 338
Memtable Data Size: 0
Memtable Switch Count: 1
Read Count: 3
Read Latency: 4354.669 ms.
Write Count: 848
Write Latency: 0.029 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 12656
Key cache capacity: 14
Key cache size: 11
Key cache hit rate: 0.
Row cache: disabled
Compacted row minimum size: 105779
Compacted row maximum size: 7152383774
Compacted row mean size: 590818614

Thanks,
Bryce


RE: Large hints column family

2012-03-14 Thread Bryce Godfrey
Forgot to mention that this is on 1.0.8

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Wednesday, March 14, 2012 12:34 PM
To: user@cassandra.apache.org
Subject: Large hints column family

The system HintsColumnFamily seems large in my cluster, and I want to track 
down why that is.  I try invoking listEndpointsPendingHints() for 
o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node 
that its invoked against.  It's a 3 node cluster, and all nodes have been up 
and running without issue for a while.  Any help on where to start with this?

   Column Family: HintsColumnFamily
SSTable count: 11
Space used (live): 11271669539
Space used (total): 11271669539
Number of Keys (estimate): 1408
Memtable Columns Count: 338
Memtable Data Size: 0
Memtable Switch Count: 1
Read Count: 3
Read Latency: 4354.669 ms.
Write Count: 848
Write Latency: 0.029 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 12656
Key cache capacity: 14
Key cache size: 11
Key cache hit rate: 0.
Row cache: disabled
Compacted row minimum size: 105779
Compacted row maximum size: 7152383774
Compacted row mean size: 590818614

Thanks,
Bryce


RE: tmp files in /var/lib/cassandra/data

2011-12-14 Thread Bryce Godfrey
I'm seeing this also, and my nodes have started crashing with too many open 
file errors.  Running lsof I see lots of these open tmp files.

java   8185  root  911u  REG   8,32 38  
129108266 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db
java   8185  root  912u  REG   8,32  0  
155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db
java   8185  root  913u  REG   8,32  0  
155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db
java   8185  root  914u  REG   8,32  0  
155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db
java   8185  root  916u  REG   8,32  0  
155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db
java   8185  root  918u  REG   8,32  0  
155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db
java   8185  root  919u  REG   8,32  0  
155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db
java   8185  root  920u  REG   8,32  0  
155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db
java   8185  root  921u  REG   8,32  0  
129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db
java   8185  root  922u  REG   8,32 38  
129108273 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db
java   8185  root  923u  REG   8,32  0  
155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db
java   8185  root  929u  REG   8,32 38  
129108262 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db
java   8185  root  947u  REG   8,32  0  
129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db
java   8185  root  948u  REG   8,32 38  
129108285 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db
java   8185  root  954u  REG   8,32  0  
155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db
java   8185  root  955u  REG   8,32  0  
155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db

Going to try rolling back to 1.0.5 for the time being even though I was hoping 
to use one of the fixes in 1.0.6

-Original Message-
From: Ramesh Natarajan [mailto:rames...@gmail.com] 
Sent: Wednesday, December 14, 2011 6:03 PM
To: user@cassandra.apache.org
Subject: tmp files in /var/lib/cassandra/data

We are using leveled compaction running cassandra 1.0.6.  I checked the data 
directory (/var/lib/cassandra/data) and i see  these 0 bytes tmp files.
What are these files?

thanks
Ramesh

-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db

RE: node stuck leaving on 1.0.5

2011-12-13 Thread Bryce Godfrey
So I got past the leaving problem once I found the removetoken force command.  
Now I'm trying to move tokens and that will never complete either, but as I was 
watching netstats for streaming to the moving node I noticed it seemed to stop 
all of a sudden and list no more pending streams.  At the same time on the 
moving node this is in the system log:

ERROR [Thread-455] 2011-12-13 16:15:51,939 AbstractCassandraDaemon.java (line 
133) Fatal exception in thread Thread[Thread-455,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481)
at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237)
at 
org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242)
at 
org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920)
at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

Now I see no streams going on between any nodes, and the node is still listed 
as moving when viewing the ring.

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Sunday, December 11, 2011 11:02 PM
To: user@cassandra.apache.org
Subject: node stuck leaving on 1.0.5

I have a dead node I need to remove from the cluster so that I can rebalance 
among the existing servers (can't replace it for a while).

I used nodetool removetoken and it's been stuck in the leaving state for over 
a day now.  I've tried a rolling restart, which kicks of some streaming for a 
while under netstats but now even that lists nothing going on.

I'm stuck on what to do next to get this node to finally leave so I can move 
the tokens around.

Only error I see in the system log:

ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 
133) Fatal exception in thread Thread[Thread-209,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481)
at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237)
at 
org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242)
at 
org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920)
at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)


node stuck leaving on 1.0.5

2011-12-11 Thread Bryce Godfrey
I have a dead node I need to remove from the cluster so that I can rebalance 
among the existing servers (can't replace it for a while).

I used nodetool removetoken and it's been stuck in the leaving state for over 
a day now.  I've tried a rolling restart, which kicks of some streaming for a 
while under netstats but now even that lists nothing going on.

I'm stuck on what to do next to get this node to finally leave so I can move 
the tokens around.

Only error I see in the system log:

ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 
133) Fatal exception in thread Thread[Thread-209,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481)
at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237)
at 
org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242)
at 
org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920)
at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)


RE: Client Timeouts on incrementing counters

2011-12-04 Thread Bryce Godfrey
I'm seeing this same problem after upgrade to 1.0.3 from .8

Nothing changed with the column family storing the counters, but now it just 
constantly times out trying to increment them.  No errors in the event logs or 
any other issues with my cluster.

Did you find a resolution?

From: Carlos Rolo [mailto:c.r...@ocom.com]
Sent: Monday, November 14, 2011 12:34 AM
To: user@cassandra.apache.org
Subject: RE: Client Timeouts on incrementing counters

I have digged a bit more to try to find the root cause of the error, and I have 
some more information.

It seems that all started after I upgraded Cassandra from 0.8.x to 1.0.0
When I do a incr on the CLI I also get a timeout.
row_cache_save_period_in_seconds is set to 60sec.

Could be a problem from the upgrade? I just did a rolling restart of all nodes 
one-by-one.


From: Tyler Hobbs 
[mailto:ty...@datastax.com]mailto:[mailto:ty...@datastax.com]
Sent: vrijdag 11 november 2011 20:18
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Client Timeouts on incrementing counters

On Fri, Nov 11, 2011 at 7:17 AM, Carlos Rolo 
c.r...@ocom.commailto:c.r...@ocom.com wrote:
Also Cassandra logs have lots (as in, several times per second) of this message 
now:

INFO 14:15:25,740 Saved ClusterCassandra-CounterFamily-RowCache (52 items) in 1 
ms


What does the CLI say the row_cache_save_period_in_seconds for this CF is?

--
Tyler Hobbs
DataStaxhttp://datastax.com/


RE: Problem after upgrade to 1.0.1

2011-11-08 Thread Bryce Godfrey
I have no errors in my system.log just these typs of warnings occasionally:
WARN [pool-1-thread-1] 2011-11-08 00:03:44,726 Memtable.java (line 167) setting 
live ratio to minimum of 1.0 instead of 0.9511448007676252

I did find the problem with my data drive consumption being so large, as I did 
not know that running scrub after the upgrade would take a snapshot of the 
data.  Once I removed all the snapshots, they data drive is back down to where 
I expect it to be.  Although the Load numbers reported by ring are much larger 
then what is in the data drive.

I've also upgrade to 1.0.2 and re-ran scrub, and now I can run cfstats again, 
so thanks for that.  Although I'm still confused on why the hints CF has become 
so large on a few of the nodes;

Column Family: HintsColumnFamily
SSTable count: 11
Space used (live): 127490858389
Space used (total): 72123363085
Number of Keys (estimate): 1408
Memtable Columns Count: 43174
Memtable Data Size: 44376138
Memtable Switch Count: 103
Read Count: 494
Read Latency: NaN ms.
Write Count: 30970531
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 14
Key cache size: 10
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 88149
Compacted row maximum size: 53142810146
Compacted row mean size: 6065512727



-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, November 04, 2011 9:29 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

One possibility: If you're overloading the cluster, replicas will drop updates 
to avoid OOMing.  (This is logged at WARN level.)  Before 1.x Cassandra would 
just let that slide, but with w/ 1.0 it started recording hints for those.

On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Thanks for the help so far.

 Is there any way to find out why my HintsColumnFamily is so large now, since 
 it wasn't this way before the upgrade and it seems to just climbing?

 I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() 
 thinking I have a bunch of stale hints from upgrade issues, but it just 
 eventually times out.  Plus the node it gets invoked against gets thrashed 
 and stops responding, forcing me to restart cassandra.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 5:06 PM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 I found the problem and posted a patch on 
 https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build with that 
 patch and rerun scrub the exception should go away.

 On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large 
 node that was seeing 2x the load of others and I get this error below.
 I'm also went into the o.a.c.db.HintedHandoffManager mbean and 
 attempted to list pending hints to see if it was growing out of 
 control for some reason, but that just times out eventually for any node.  
 I'm not sure what to do next with this issue

Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine 
with the rolling upgrade.  But now I'm having extreme load growth on one of my 
nodes (and others are growing faster than usual also).  I attempted to run a 
cfstats against the extremely large node that was seeing 2x the load of others 
and I get this error below.  I'm also went into the 
o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see 
if it was growing out of control for some reason, but that just times out 
eventually for any node.  I'm not sure what to do next with this issue.

   Column Family: HintsColumnFamily
SSTable count: 3
Space used (live): 12681676437
Space used (total): 10233130272
Number of Keys (estimate): 384
Memtable Columns Count: 117704
Memtable Data Size: 115107307
Memtable Switch Count: 66
Read Count: 0
Read Latency: NaN ms.
Write Count: 21203290
Write Latency: 0.014 ms.
Pending Tasks: 0
Key cache capacity: 3
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 30130993
Compacted row maximum size: 9223372036854775807
Exception in thread main java.lang.IllegalStateException: Unable to compute 
ceiling for max when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
at 
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
at 
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477



RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Nope.  I did alter two of my own column families to use Leveled compaction and 
then ran scrub on each node, is the only change I have made from the upgrade.

Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | 
M: 206.849.2477

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 11:44 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

Just to rule it out: you didn't do anything tricky like update 
HintsColumnFamily to use compression?

On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  I'm 
 also went into the o.a.c.db.HintedHandoffManager mbean and attempted 
 to list pending hints to see if it was growing out of control for some 
 reason, but that just times out eventually for any node.  I'm not sure what 
 to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21
 6)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def
 aultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav
 a:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio
 nImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection
 Impl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(
 RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM
 IConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti
 onImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown 
 Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

     at sun.rmi.transport.Transport$1.run(Transport.java:159)

     at java.security.AccessController.doPrivileged(Native Method)

     at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

     at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:53
 5)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport
 .java:790)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.
 java:649)

     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
 tor.java:886

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Disk utilization is actually about 80% higher than what is reported for 
nodetool ring across all my nodes on the data drive

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477

From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
Sent: Thursday, November 03, 2011 11:47 AM
To: user@cassandra.apache.org
Subject: RE: Problem after upgrade to 1.0.1

Regarding load growth, presumably you are referring to the load as reported by 
JMX/nodetool. Have you actually looked at the disk utilization on the nodes 
themselves? Potential issue I have seen: 
http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html

Dan

From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com]
Sent: November-03-11 14:40
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Problem after upgrade to 1.0.1

I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine 
with the rolling upgrade.  But now I'm having extreme load growth on one of my 
nodes (and others are growing faster than usual also).  I attempted to run a 
cfstats against the extremely large node that was seeing 2x the load of others 
and I get this error below.  I'm also went into the 
o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see 
if it was growing out of control for some reason, but that just times out 
eventually for any node.  I'm not sure what to do next with this issue.

   Column Family: HintsColumnFamily
SSTable count: 3
Space used (live): 12681676437
Space used (total): 10233130272
Number of Keys (estimate): 384
Memtable Columns Count: 117704
Memtable Data Size: 115107307
Memtable Switch Count: 66
Read Count: 0
Read Latency: NaN ms.
Write Count: 21203290
Write Latency: 0.014 ms.
Pending Tasks: 0
Key cache capacity: 3
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 30130993
Compacted row maximum size: 9223372036854775807
Exception in thread main java.lang.IllegalStateException: Unable to compute 
ceiling for max when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
at 
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
at 
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
A restart fixed the load numbers, they are back to where I expect them to be 
now, but disk utilization is double the load #.  I'm also still get the cfstats 
exception from any node.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 11:52 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

Does restarting the node fix this?

On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  I'm 
 also went into the o.a.c.db.HintedHandoffManager mbean and attempted 
 to list pending hints to see if it was growing out of control for some 
 reason, but that just times out eventually for any node.  I'm not sure what 
 to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21
 6)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def
 aultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav
 a:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio
 nImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection
 Impl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(
 RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM
 IConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti
 onImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown 
 Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Thanks for the help so far.  

Is there any way to find out why my HintsColumnFamily is so large now, since it 
wasn't this way before the upgrade and it seems to just climbing?  

I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking 
I have a bunch of stale hints from upgrade issues, but it just eventually times 
out.  Plus the node it gets invoked against gets thrashed and stops responding, 
forcing me to restart cassandra.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 5:06 PM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

I found the problem and posted a patch on 
https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build with that 
patch and rerun scrub the exception should go away.

On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  
 I'm also went into the o.a.c.db.HintedHandoffManager mbean and 
 attempted to list pending hints to see if it was growing out of 
 control for some reason, but that just times out eventually for any node.  
 I'm not sure what to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3
 9
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily
 S
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
 Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
 j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 s
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j
 a
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6
 5
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2
 1
 6

Running on Windows

2011-10-03 Thread Bryce Godfrey
I'm wondering what the consensus is for running a Cassandra cluster on top of 
Windows boxes?  We are currently running a small 5 node cluster on top of 
CentOS without problems, so I have no desire to move.  But we are a windows 
shop, and I have an IT department that is scared of Linux since they don't know 
how to manage it.

My primary thoughts of why not, was just community support (haven't seen or 
heard of  anybody else doing it on Windows), performance, and stability.  The 
last two are mostly guesses by me, but my thoughts was that java on windows 
just does not perform as well.  We have a very high right load, and are adding 
about 5 GB a day of data with a 3 month retention.

I really don't want to move a stable system onto an unknown just out of fear of 
the unknown from my IT department, so looking for some ammo.  Thanks :)

~Bryce


RE: Completely removing a node from the cluster

2011-08-23 Thread Bryce Godfrey
Taking the cluster down completely did remove the phantom node.  The 
hintscolumnfamily is causing a lot of commit logs to back up and threaten the 
commit log drive to run out of space.  A manual flush of that column family 
always clears out the files though.


-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Tuesday, August 23, 2011 10:42 AM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote:
 I'm running low on ideas for this one. Anyone else ?

 If the phantom node is not listed in the ring, other nodes should not be 
 storing hints for it. You can see what nodes they are storing hints for via 
 JConsole.

I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071

--Brandon


RE: Completely removing a node from the cluster

2011-08-22 Thread Bryce Godfrey
Could this ghost node be causing my hints column family to grow to this size?  
I also crash after about 24 hours due to commit logs growth taking up all the 
drive space.  A manual nodetool flush keeps it under control though.


Column Family: HintsColumnFamily
SSTable count: 6
Space used (live): 666480352
Space used (total): 666480352
Number of Keys (estimate): 768
Memtable Columns Count: 1043
Memtable Data Size: 461773
Memtable Switch Count: 3
Read Count: 38
Read Latency: 131.289 ms.
Write Count: 582108
Write Latency: 0.019 ms.
Pending Tasks: 0
Key cache capacity: 7
Key cache size: 6
Key cache hit rate: 0.8334
Row cache: disabled
Compacted row minimum size: 2816160
Compacted row maximum size: 386857368
Compacted row mean size: 120432714

Is there a way for me to manually remove this dead node?

-Original Message-
From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
Sent: Sunday, August 21, 2011 9:09 PM
To: user@cassandra.apache.org
Subject: RE: Completely removing a node from the cluster

It's been at least 4 days now.

-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Sunday, August 21, 2011 3:16 PM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

I see the mistake I made about ring, gets the endpoint list from the same place 
but uses the token's to drive the whole process. 

I'm guessing here, don't have time to check all the code. But there is a 3 day 
timeout in the gossip system. Not sure if it applies in this case. 

Anyone know ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:

 Both .2 and .3 list the same from the mbean that Unreachable is empty 
 collection, and Live node lists all 3 nodes still:
 192.168.20.2
 192.168.20.3
 192.168.20.1
 
 The removetoken was done a few days ago, and I believe the remove was done 
 from .2
 
 Here is what ring outlook looks like, not sure why I get that token on the 
 empty first line either:
 Address DC  RackStatus State   LoadOwns   
  Token
   
 85070591730234615865843651857942052864
 192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   50.00%   0
 192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   50.00%  
 85070591730234615865843651857942052864
 
 Yes, both nodes show the same thing when doing a describe cluster, that .1 is 
 unreachable.
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 4:23 AM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 Unreachable nodes in either did not respond to the message or were known to 
 be down and were not sent a message. 
 The way the node lists are obtained for the ring command and describe cluster 
 are the same. So it's a bit odd. 
 
 Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? 
 What do the LiveNode and UnrechableNodes attributes say ? 
 
 Also how long ago did you remove the token and on which machine? Do both 20.2 
 and 20.3 think 20.1 is still around ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
 
 I'm on 0.8.4
 
 I have removed a dead node from the cluster using nodetool removetoken 
 command, and moved one of the remaining nodes to rebalance the tokens.  
 Everything looks fine when I run nodetool ring now, as it only lists the 
 remaining 2 nodes and they both look fine, owning 50% of the tokens.
 
 However, I can still see it being considered as part of the cluster from the 
 Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the 
 cluster is still queuing up hints for the node, or any other issues it may 
 cause:
 
 Cluster Information:
  Snitch: org.apache.cassandra.locator.SimpleSnitch
  Partitioner: org.apache.cassandra.dht.RandomPartitioner
  Schema versions:
   dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3]
   UNREACHABLE: [192.168.20.1]
 
 
 Do I need to do something else to completely remove this node?
 
 Thanks,
 Bryce
 



RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
Both .2 and .3 list the same from the mbean that Unreachable is empty 
collection, and Live node lists all 3 nodes still:
192.168.20.2
192.168.20.3
192.168.20.1

The removetoken was done a few days ago, and I believe the remove was done from 
.2

Here is what ring outlook looks like, not sure why I get that token on the 
empty first line either:
Address DC  RackStatus State   LoadOwns
Token
   
85070591730234615865843651857942052864
192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   50.00%  0
192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   50.00%  
85070591730234615865843651857942052864

Yes, both nodes show the same thing when doing a describe cluster, that .1 is 
unreachable.


-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Sunday, August 21, 2011 4:23 AM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

Unreachable nodes in either did not respond to the message or were known to be 
down and were not sent a message. 
The way the node lists are obtained for the ring command and describe cluster 
are the same. So it's a bit odd. 

Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? 
What do the LiveNode and UnrechableNodes attributes say ? 

Also how long ago did you remove the token and on which machine? Do both 20.2 
and 20.3 think 20.1 is still around ? 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:

 I'm on 0.8.4
 
 I have removed a dead node from the cluster using nodetool removetoken 
 command, and moved one of the remaining nodes to rebalance the tokens.  
 Everything looks fine when I run nodetool ring now, as it only lists the 
 remaining 2 nodes and they both look fine, owning 50% of the tokens.
 
 However, I can still see it being considered as part of the cluster from the 
 Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the 
 cluster is still queuing up hints for the node, or any other issues it may 
 cause:
 
 Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3]
UNREACHABLE: [192.168.20.1]
 
 
 Do I need to do something else to completely remove this node?
 
 Thanks,
 Bryce



RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
It's been at least 4 days now.

-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Sunday, August 21, 2011 3:16 PM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

I see the mistake I made about ring, gets the endpoint list from the same place 
but uses the token's to drive the whole process. 

I'm guessing here, don't have time to check all the code. But there is a 3 day 
timeout in the gossip system. Not sure if it applies in this case. 

Anyone know ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:

 Both .2 and .3 list the same from the mbean that Unreachable is empty 
 collection, and Live node lists all 3 nodes still:
 192.168.20.2
 192.168.20.3
 192.168.20.1
 
 The removetoken was done a few days ago, and I believe the remove was done 
 from .2
 
 Here is what ring outlook looks like, not sure why I get that token on the 
 empty first line either:
 Address DC  RackStatus State   LoadOwns   
  Token
   
 85070591730234615865843651857942052864
 192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   50.00%   0
 192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   50.00%  
 85070591730234615865843651857942052864
 
 Yes, both nodes show the same thing when doing a describe cluster, that .1 is 
 unreachable.
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 4:23 AM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 Unreachable nodes in either did not respond to the message or were known to 
 be down and were not sent a message. 
 The way the node lists are obtained for the ring command and describe cluster 
 are the same. So it's a bit odd. 
 
 Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? 
 What do the LiveNode and UnrechableNodes attributes say ? 
 
 Also how long ago did you remove the token and on which machine? Do both 20.2 
 and 20.3 think 20.1 is still around ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
 
 I'm on 0.8.4
 
 I have removed a dead node from the cluster using nodetool removetoken 
 command, and moved one of the remaining nodes to rebalance the tokens.  
 Everything looks fine when I run nodetool ring now, as it only lists the 
 remaining 2 nodes and they both look fine, owning 50% of the tokens.
 
 However, I can still see it being considered as part of the cluster from the 
 Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the 
 cluster is still queuing up hints for the node, or any other issues it may 
 cause:
 
 Cluster Information:
  Snitch: org.apache.cassandra.locator.SimpleSnitch
  Partitioner: org.apache.cassandra.dht.RandomPartitioner
  Schema versions:
   dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3]
   UNREACHABLE: [192.168.20.1]
 
 
 Do I need to do something else to completely remove this node?
 
 Thanks,
 Bryce
 



Completely removing a node from the cluster

2011-08-19 Thread Bryce Godfrey
I'm on 0.8.4

I have removed a dead node from the cluster using nodetool removetoken command, 
and moved one of the remaining nodes to rebalance the tokens.  Everything looks 
fine when I run nodetool ring now, as it only lists the remaining 2 nodes and 
they both look fine, owning 50% of the tokens.

However, I can still see it being considered as part of the cluster from the 
Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the 
cluster is still queuing up hints for the node, or any other issues it may 
cause:

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3]
UNREACHABLE: [192.168.20.1]


Do I need to do something else to completely remove this node?

Thanks,
Bryce


RE: No space left on device problem when starting Cassandra

2011-05-31 Thread Bryce Godfrey
That did it.  Once I moved the logs over to folder on /dev drive and deleted 
the old logs directory it started up.

Thanks!

-Original Message-
From: Maki Watanabe [mailto:watanabe.m...@gmail.com] 
Sent: Tuesday, May 31, 2011 6:40 PM
To: user@cassandra.apache.org
Subject: Re: No space left on device problem when starting Cassandra

at org.apache.log4j.Category.info(Category.java:666)

It seems that your cassandra can't write log by device full.
Check where your cassanra log is written to. The log file path is configured at 
log4j.appender.R.File property in conf/log4j-server.properties.

maki

2011/6/1 Bryce Godfrey bryce.godf...@azaleos.com:
 Hi there, I'm a bit new to Linux and Cassandra so I'm hoping someone 
 can help me with this.



 I've been evaluating Cassandra for the last few days and I'm now 
 having a problem starting up the service.   I receive this error below 
 and I'm unsure on where I'm out of space at, and how to free up more.



 azadmin@cassandra-01: $ sudo 
 /usr/tmp/apache-cassandra-0.7.6-2/bin/cassandra
 -f

 INFO 18:21:46,830 Logging initialized

 log4j:ERROR Failed to flush writer,

 java.io.IOException: No space left on device

     at java.io.FileOutputStream.writeBytes(Native Method)

     at java.io.FileOutputStream.write(FileOutputStream.java:297)

     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)

     at 
 sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:290)

     at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:294)

     at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)

     at 
 java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)

     at 
 org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)

     at
 org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)

     at
 org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.jav
 a:276)

     at 
 org.apache.log4j.WriterAppender.append(WriterAppender.java:162)

     at
 org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

     at
 org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(
 AppenderAttachableImpl.java:66)

     at org.apache.log4j.Category.callAppenders(Category.java:206)

     at org.apache.log4j.Category.forcedLog(Category.java:391)

     at org.apache.log4j.Category.info(Category.java:666)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(Abstract
 CassandraDaemon.java:79)

 INFO 18:21:46,841 Heap size: 16818110464/16819159040

 #

 # A fatal error has been detected by the Java Runtime Environment:

 #

 #  SIGBUS (0x7) at pc=0x7f35b493f571, pid=1234, 
 tid=139869156091648

 #

 # JRE version: 6.0_22-b22

 # Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 
 compressed oops)

 # Derivative: IcedTea6 1.10.1

 # Distribution: Ubuntu Natty (development branch), package
 6b22-1.10.1-0ubuntu1

 # Problematic frame:

 # C  [libffi.so.5+0x2571]  ffi_prep_java_raw_closure+0x541

 #

 # An error report file with more information is saved as:

 # /media/commitlogs/hs_err_pid1234.log

 #

 # If you would like to submit a bug report, please include

 # instructions how to reproduce the bug and visit:

 #   https://bugs.launchpad.net/ubuntu/+source/openjdk-6/

 # The crash happened outside the Java Virtual Machine in native code.

 # See problematic frame for where to report the bug.

 #



 I seem to have enough space, except in the 
 /dev/mapper/Cassandra-01-root, and I'm unsure of that anyway:

 azadmin@cassandra-01:/$ df -h

 Filesystem    Size  Used Avail Use% Mounted on

 /dev/mapper/cassandra--01-root

   1.2G  1.2G 0 100% /

 none   16G  236K   16G   1% /dev

 none   16G 0   16G   0% /dev/shm

 none   16G   36K   16G   1% /var/run

 none   16G 0   16G   0% /var/lock

 /dev/sdb1  33G  176M   33G   1% /media/commitlogs

 /dev/sdc1  66G  180M   66G   1% /media/data

 /dev/sda1 228M   23M  193M  11% /boot



 Thanks,

 ~Bryce