[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-19 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257556#comment-13257556
 ] 

Brandon Williams commented on CASSANDRA-3909:
-

bq. personally I do trust you on that this can't break anything

3

bq. I do however think that in general there would be some merit to stick to 
more strict rules.

I agree, however my reasoning is thus: if we support wide rows in 1.1.0 (and we 
do) then why not pig?

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4159) isReadyForBootstrap doesn't compare schema UUID by timestamp as it should

2012-04-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255652#comment-13255652
 ] 

Brandon Williams commented on CASSANDRA-4159:
-

+1

 isReadyForBootstrap doesn't compare schema UUID by timestamp as it should
 -

 Key: CASSANDRA-4159
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4159
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.0.10

 Attachments: 4159.txt


 CASSANDRA-3629 introduced a wait to be sure the node is up to date on the 
 schema before starting bootstrap. However, the isReadyForBootsrap() method 
 compares schema version using UUID.compareTo(), which doesn't compare UUID by 
 timestamp, while the rest of the code does compare using timestamp 
 (MigrationManager.updateHighestKnown).
 During a test where lots of node were boostrapped simultaneously (and some 
 schema change were done), we ended up having some node stuck in the 
 isReadyForBoostrap loop. Restarting the node fixed it, so while I can't 
 confirm it, I suspect this was the source of that problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3909) Pig should handle wide rows

2012-04-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255954#comment-13255954
 ] 

Brandon Williams commented on CASSANDRA-3909:
-

Sylvain, any reason we can't put this in 1.1.0?  It has to be explicitly 
enabled so it can't break anything existing, and it goes well with the hadoop 
wide row support we already put in 1.1.0.

 Pig should handle wide rows
 ---

 Key: CASSANDRA-3909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3909
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1

 Attachments: 3909.txt


 Pig should be able to use the wide row support in CFIF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4151) Apache project branding requirements: DOAP file [PATCH]

2012-04-14 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254172#comment-13254172
 ] 

Brandon Williams commented on CASSANDRA-4151:
-

This appears to reference our svn repo, which is now dead as we have migrated 
to git.

 Apache project branding requirements: DOAP file [PATCH]
 ---

 Key: CASSANDRA-4151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Shane Curcuru
  Labels: branding
 Attachments: doap_Cassandra.rdf


 Attached.  Re: http://www.apache.org/foundation/marks/pmcs
 See Also: http://projects.apache.org/create.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3946) BulkRecordWriter shouldn't stream any empty data/index files that might be created at end of flush

2012-04-13 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253757#comment-13253757
 ] 

Brandon Williams commented on CASSANDRA-3946:
-

I suspect if there is a problem here it's actually in 
UnsortedSimpleSSTableWriter.

 BulkRecordWriter shouldn't stream any empty data/index files that might be 
 created at end of flush
 --

 Key: CASSANDRA-3946
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3946
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.1.1

 Attachments: 
 0001-CASSANDRA-3946-BulkRecordWriter-shouldn-t-stream-any.patch


 If by chance, we flush sstables during BulkRecordWriter (we have seen it 
 happen), I want to make sure we don't try to stream them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4115) UNREACHABLE schema after decommissioning a non-seed node

2012-04-12 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252550#comment-13252550
 ] 

Brandon Williams commented on CASSANDRA-4115:
-

This must be something with the environment, on 1.0 I see the node removed as 
soon as the decom node begins to announce it has left (before it has completed 
announcing, and thus before nodetool would even return) and it never reappears.

 UNREACHABLE schema after decommissioning a non-seed node
 

 Key: CASSANDRA-4115
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4115
 Project: Cassandra
  Issue Type: Bug
 Environment: ccm using the following unavailable_schema_test.py dtest.
Reporter: Tyler Patterson
Assignee: Brandon Williams
Priority: Minor
 Attachments: 4115.txt


 decommission a non-seed node, sleep 30 seconds, then use thrift to check the 
 schema. UNREACHABLE is listed:
 {'75dc4c07-3c1a-3013-ad7d-11fb34208465': ['127.0.0.1'],
  'UNREACHABLE': ['127.0.0.2']}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4115) UNREACHABLE schema after decommissioning a non-seed node

2012-04-12 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252554#comment-13252554
 ] 

Brandon Williams commented on CASSANDRA-4115:
-

What does strike me as odd though is that your 1.0 test has no schema at all, 
hence the --1000-- uuid.

 UNREACHABLE schema after decommissioning a non-seed node
 

 Key: CASSANDRA-4115
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4115
 Project: Cassandra
  Issue Type: Bug
 Environment: ccm using the following unavailable_schema_test.py dtest.
Reporter: Tyler Patterson
Assignee: Brandon Williams
Priority: Minor
 Attachments: 4115.txt


 decommission a non-seed node, sleep 30 seconds, then use thrift to check the 
 schema. UNREACHABLE is listed:
 {'75dc4c07-3c1a-3013-ad7d-11fb34208465': ['127.0.0.1'],
  'UNREACHABLE': ['127.0.0.2']}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

2012-04-11 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251703#comment-13251703
 ] 

Brandon Williams commented on CASSANDRA-3883:
-

LGTM, +1

 CFIF WideRowIterator only returns batch size columns
 

 Key: CASSANDRA-3883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Brandon Williams
Assignee: Jonathan Ellis
 Fix For: 1.1.0

 Attachments: 3883-v1.txt, 3883-v2.txt, 3883-v3.txt


 Most evident with the word count, where there are 1250 'word1' items in two 
 rows (1000 in one, 250 in another) and it counts 198 with the batch size set 
 to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4134) Do not send hints before a node is fully up

2012-04-10 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251138#comment-13251138
 ] 

Brandon Williams commented on CASSANDRA-4134:
-

Isn't this exactly what HHOM.waitForSchemaAgreement is doing though?

 Do not send hints before a node is fully up
 ---

 Key: CASSANDRA-4134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4134
 Project: Cassandra
  Issue Type: Bug
Reporter: Joaquin Casares
Priority: Minor

 After seeing this on a cluster and working with Pavel, we have seen the 
 following errors disappear after all migrations have been applied:
 {noformat}
 ERROR [MutationStage:1] 2012-04-09 18:16:00,240 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find 
 cfId=1028
   at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
   at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
   at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 and
 ERROR [ReadStage:69] 2012-04-09 18:16:01,715 AbstractCassandraDaemon.java 
 (line 139) Fatal exception in thread Thread[ReadStage:69,5,main]
 java.lang.IllegalArgumentException: Unknown ColumnFamily content_indexes in 
 keyspace linkcurrent
   at org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
   at 
 org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:300)
   at 
 org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommand.init(SliceByNamesReadCommand.java:44)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
   at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
   at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It seems as though as soon as the correct Migration is applied, the Hints are 
 accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4092) Allow getting a simple Token-node map over thrift

2012-04-09 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250187#comment-13250187
 ] 

Brandon Williams commented on CASSANDRA-4092:
-

Sam, could you rebase against 1.1 since that is where we intend to commit it?

 Allow getting a simple Token-node map over thrift
 --

 Key: CASSANDRA-4092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4092
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nick Bailey
Assignee: Sam Tunnicliffe
 Fix For: 1.1.1

 Attachments: 
 v1-0001-CASSANDRA-4092-New-thrift-call-to-get-simple-token-nod.txt, 
 v1-0002-CASSANDRA-4092-New-generated-java-following-change-to-.txt, 
 v2-0001-CASSANDRA-4092-New-thrift-call-to-get-simple-token-nod.txt, 
 v2-0002-CASSANDRA-4092-New-generated-java-following-change-to-.txt


 Right now the thrift describe_ring call is intended to be used to determine 
 ownership for a keyspace. It can also (and often is) be used by clients to 
 just get a view of what the ring looks like. Since it requires a keyspace as 
 an argument though, it can sometimes be impossible to see what the ring looks 
 like. For example, in a 2 DC/2 node ring where keyspace X exists only dc1. 
 The results of 'describe_ring X' would look something like (with tokens 0 and 
 10):
 {noformat}
 {[0,10]: [node0], [10,0]: [node0]}
 {noformat}
 This is indicating that node0 owns everything for that keyspace since it only 
 exists in 1 datacenter. From this output though it is impossible to tell 
 which token (0 or 10) node0 owns, as well as what the other node in the 
 cluster is.
 There are two options here. 
 * Allow running describe_ring with no parameters to get a view of token-ip 
 without taking replication into consideration.
 * Add a new thrift call to achieve this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4128) stress tool hangs forever on timeout or error

2012-04-06 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248685#comment-13248685
 ] 

Brandon Williams commented on CASSANDRA-4128:
-

+1

 stress tool hangs forever on timeout or error
 -

 Key: CASSANDRA-4128
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4128
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: This happens in every version of the stress tool, that I 
 know of, including calling it from the dtests.
Reporter: Tyler Patterson
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: stress
 Fix For: 1.1.1

 Attachments: CASSANDRA-4128.patch


 The stress tool hangs forever if it encounters a timeout or exception. CTRL-C 
 will kill it if run from a terminal, but when running it from a script (like 
 a dtest) it hangs the script forever. It would be great for scripting it if a 
 reasonable error code was returned when things go wrong.
 To duplicate, clear out /var/lib/cassandra and then run stress 
 --operation=READ.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3946) BulkRecordWriter shouldn't stream any empty data/index files that might be created at end of flush

2012-04-04 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246543#comment-13246543
 ] 

Brandon Williams commented on CASSANDRA-3946:
-

I'm not convinced that having the loader skip empty files is right yet.  Why 
are empty files being created?

 BulkRecordWriter shouldn't stream any empty data/index files that might be 
 created at end of flush
 --

 Key: CASSANDRA-3946
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3946
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.1.1

 Attachments: 
 0001-CASSANDRA-3946-BulkRecordWriter-shouldn-t-stream-any.patch


 If by chance, we flush sstables during BulkRecordWriter (we have seen it 
 happen), I want to make sure we don't try to stream them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-30 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242498#comment-13242498
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

+1

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.0.9, 1.1.0

 Attachments: 0001-CASSANDRA-4099-v2.patch, 
 0001-CASSANDRA-4099-v3.patch, 0001-CASSANDRA-4099-v4.patch, 
 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

2012-03-30 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242802#comment-13242802
 ] 

Brandon Williams commented on CASSANDRA-3722:
-

bq. is it because the lack of a datapoint, isn't taken into account as slowness?

Exactly.  It's not receiving new data, so the score doesn't change and the dead 
host is still rated the best until the FD removes it as an option.  Doing it 
this way, time itself penalizes the host when it stops responding.

 Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
 --

 Key: CASSANDRA-3722
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-3722-A1-V2.patch, 
 0001-CASSANDRA-3722-A1.patch, 0001-CASSANDRA-3722-v3.patch, 
 0001-CASSANDRA-3723-A2-Patch.patch, 
 0001-Expose-SP-latencies-in-nodetool-proxyhistograms.txt, 3722-v4.txt


 Currently Dynamic snitch looks at the latency for figuring out which node 
 will be better serving the requests, this works great but there is a part of 
 the traffic sent to collect this data... There is also a window when Snitch 
 doesn't know about some major event which are going to happen on the node 
 (Node which is going to receive the data request).
 It would be great if we can send some sort hints to the Snitch so they can 
 score based on known events causing higher latencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241258#comment-13241258
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

I'm confused, how does 'from' differ from 'msg.getFrom' in this patch?  It 
seems like a no-op.

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099-v2.patch, 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241351#comment-13241351
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

I see, it's injecting another getFrom call.  +1 (though this version only 
applies to trunk)

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099-v2.patch, 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4101) Gossip should propagate MessagingService.version

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241462#comment-13241462
 ] 

Brandon Williams commented on CASSANDRA-4101:
-

For people using broadcast address, there is no BCA-local address mapping kept 
by nodes, so changing the version stored in Gossiper's map is difficult, since 
you have the version stored as one and not the other.

 Gossip should propagate MessagingService.version
 

 Key: CASSANDRA-4101
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4101
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Critical
 Fix For: 1.1.0

 Attachments: 4101.txt


 In CASSANDRA-4099 it's becoming apparent that it's time to fix our hacky 
 versioning tricks we've used to remain backward-compatible.  As a first step, 
 let's communicate the version via gossip so we can eventually reason based on 
 that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241474#comment-13241474
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

It overrides where from is set in the constructor:

{code}
this.from = socket.getInetAddress();
{code}

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099-v2.patch, 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241484#comment-13241484
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

I agree, we should get out of the habit of examining sockets directly due to 
broadcast_address.

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099-v2.patch, 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241502#comment-13241502
 ] 

Brandon Williams commented on CASSANDRA-3911:
-

First off, this is setting result limits, more than quality of service, since 
we aren't [de]prioritizing anything, so I don't think QoS is the right term to 
be using here.

Secondly, I don't think this patch satisfies the goal of Limit how many 
columns may be returned (if count  N) throw exception before processing since 
the server actually does the processing, then limits what it will return, which 
only saves a copy in thrift.  You can still request all 2B columns in a row and 
OOM the server.

 Basic QoS support for helping reduce OOMing cluster
 ---

 Key: CASSANDRA-3911
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Harish Doddi
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-3911-trunk.txt


 We'd like to propose adding some basic QoS features to Cassandra. There can 
 be a lot to be done here but for v1 to keep things less invasive, and still 
 provide basics we would like to contribute the following features and see if 
 the community thinks this is OK.
 We would set these on server (cassandra.yaml). If threshold is crossed, we 
 throw an exception up to the client.
 1) Limit how many rows a client can fetch over RPC through multi-get.
 2) Limit how many columns may be returned (if count  N) throw exception 
 before processing.
 3) Limit how many rows and columns a client can try to batch mutate.
 This can be added in our Thrift logic, before any processing can be done. The 
 big reason why we want to do this, is so that customers don't shoot 
 themselves in the foot, by making mistakes or not knowing how many columns 
 they might have returned.
 We can build logic like this into a basic client, but I propose one of the 
 features we might want in Cassandra is support for not being able to OOM a 
 node. We've done lots of work around memtable flushing, dropping messages, 
 etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4104) Cassandra appears to hang when JNA enabled and heapsize free memory

2012-03-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241598#comment-13241598
 ] 

Brandon Williams commented on CASSANDRA-4104:
-

Also, it's not hanging nor exiting without telling you what's wrong, you just 
aren't running it in foreground mode:

{noformat}
# bin/cassandra -f
Error occurred during initialization of VM
Could not reserve enough space for object heap
{noformat}

 Cassandra appears to hang when JNA enabled and heapsize  free memory
 -

 Key: CASSANDRA-4104
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4104
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.8
Reporter: Joaquin Casares
Priority: Minor
  Labels: datastax_qa

 When JNA is enabled heapsize is larger than free memory, all that is printed 
 out is the classpath, then the printouts stop.
 If you hit enter again, you get the commandline, but no Cassandra process is 
 running.
 Tested on both OpenJDK and Oracle Java.
 {noformat}
 datastax@datastax-image:~/repos/cassandra$ free -m
  total   used   free sharedbuffers cached
 Mem:  2008740   1267  0  3 54
 -/+ buffers/cache:682   1326
 Swap:0  0  0
 datastax@datastax-image:~/repos/cassandra$ sudo bin/cassandra
 datastax@datastax-image:~/repos/cassandra$  INFO 14:31:32,520 Logging 
 initialized
  INFO 14:31:32,533 JVM vendor/version: Java HotSpot(TM) 64-Bit Server 
 VM/1.6.0_31
  INFO 14:31:32,534 Heap size: 1247805440/1247805440
  INFO 14:31:32,534 Classpath: 
 bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar
 datastax@datastax-image:~/repos/cassandra$ ps auwx | grep cass
 datastax 18374  1.0  0.0  13448   904 pts/2S+   14:32   0:00 grep 
 --color=auto cass
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4101) Gossip should propagate MessagingService.version

2012-03-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240810#comment-13240810
 ] 

Brandon Williams commented on CASSANDRA-4101:
-

Set this to 1.1.0 so we can have a common first version where we can trust this 
info.

 Gossip should propagate MessagingService.version
 

 Key: CASSANDRA-4101
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4101
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Priority: Critical
 Fix For: 1.1.0


 In CASSANDRA-4099 it's becoming apparent that it's time to fix our hacky 
 versioning tricks we've used to remain backward-compatible.  As a first step, 
 let's communicate the version via gossip so we can eventually reason based on 
 that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240811#comment-13240811
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

This doesn't look like a perfect solution since all nodes will have to stream 
to all other nodes in order to learn the correct version, and thus be able to 
use newer-version features.  I'm not sure there's currently a way around this, 
though.  I created CASSANDRA-4101 to get us started there, but I'll look more 
closely here tomorrow.

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240815#comment-13240815
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

Won't 'from' still always be wrong in your configuration unless streaming 
occurs?

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress

2012-03-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240822#comment-13240822
 ] 

Brandon Williams commented on CASSANDRA-4099:
-

I think I see, in your situation the version is correct for everything except 
streaming, hence 1)?  It seems like the problem here is it will still accept 
streams from a lesser version, which is always version-specific.

 IncomingTCPConnection recognizes from by doing socket.getInetAddress() 
 instead of BroadCastAddress
 --

 Key: CASSANDRA-4099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099
 Project: Cassandra
  Issue Type: Bug
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4099.patch


 change this.from = socket.getInetAddress() to understand the broad cast IP, 
 but the problem is we dont know until the first packet is received, this 
 ticket is to work around the problem until it reads the first packet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4086) decom should shut thrift down

2012-03-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239629#comment-13239629
 ] 

Brandon Williams commented on CASSANDRA-4086:
-

I've tried that, but it feels like it makes the logic for decom much less 
clear, and has a side effect that drain shuts the node down, which we don't 
want.

 decom should shut thrift down
 -

 Key: CASSANDRA-4086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4086
 Project: Cassandra
  Issue Type: Bug
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.0.9, 1.1.0

 Attachments: 4086.txt


 If you decom a node an then try to use it, you get nothing but timeouts.  
 Instead let's just kill thrift so intelligent clients can move along.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4086) decom should shut thrift down

2012-03-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239655#comment-13239655
 ] 

Brandon Williams commented on CASSANDRA-4086:
-

Historically I think the reasoning is you may have packaging that automatically 
restarts the process, which is something you don't really want with decom, but 
isn't a huge problem for drain.  David apparently ran into this problem on 
CASSANDRA-1483.

 decom should shut thrift down
 -

 Key: CASSANDRA-4086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4086
 Project: Cassandra
  Issue Type: Bug
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.0.9, 1.1.0

 Attachments: 4086.txt


 If you decom a node an then try to use it, you get nothing but timeouts.  
 Instead let's just kill thrift so intelligent clients can move along.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3937) nodetool describering should report the schema version

2012-03-26 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238637#comment-13238637
 ] 

Brandon Williams commented on CASSANDRA-3937:
-

bq. Firstly, is prepending the schema version string onto the existing output 
likely to break any existing tooling ?

If someone is naively automating nodetool instead of using JMX itself, that is 
their problem to solve, so I don't see a problem with this.

bq. Secondly, StorageProxy.decribeSchemaVersions() messages all live nodes in 
the ring to find out their Schema versions. I thought that maybe this might be 
a useful thing to expose via nodetool. Are there objections to adding nodetool 
commands that require network comms?

I think this kind of violates nodetool's current philosophy (ala 
CASSANDRA-2607) where it should do one thing against that machine only.  If 
someone wants to see all the versions they can automate nodetool quite easily.

 nodetool describering should report the schema version
 --

 Key: CASSANDRA-3937
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3937
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Priority: Trivial
  Labels: lhf
 Fix For: 1.1.0

 Attachments: 
 v1-0001-CASSANDRA-3937-Add-schema-version-to-nodetool-describe.txt


 Specifically to aid in debugging things like CASSANDRA-3931, now that you 
 can't just decode the UUIDs to see which one has the higher timestamp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

2012-03-23 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237102#comment-13237102
 ] 

Brandon Williams commented on CASSANDRA-3722:
-

We should probably avoid having Gossiper inject application states directly, if 
for nothing else than to not make life harder for CASSANDRA-3125

 Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
 --

 Key: CASSANDRA-3722
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-3722-A1.patch


 Currently Dynamic snitch looks at the latency for figuring out which node 
 will be better serving the requests, this works great but there is a part of 
 the traffic sent to collect this data... There is also a window when Snitch 
 doesn't know about some major event which are going to happen on the node 
 (Node which is going to receive the data request).
 It would be great if we can send some sort hints to the Snitch so they can 
 score based on known events causing higher latencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4066) Cassandra cluster stops responding on time change (scheduling not using monotonic time?)

2012-03-20 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233648#comment-13233648
 ] 

Brandon Williams commented on CASSANDRA-4066:
-

I can confirm that this is indeed the GossipTask no longer running when the 
clock is pushed backward far enough.  As you mention, we use SES extensively 
and likely all those timed tasks have also quit firing, which could lead to an 
untold amount of confusion if we special-cased gossip, since there would be no 
immediate red flag to indicate a problem.  Starting a node far in the future 
has other consequences too, such as CASSANDRA-3654.  I think I would rather see 
the UAE and know that my machines have connectivity to identify this problem 
and fix it correctly.

Even if we do special-case the GossipTask, we'll also need to fix 
LoadBroadcaster so we don't end up with a broken view of the load on the ring, 
and at that point it feels like a slippery slope where we need to fix 
everything, or fail as quickly as possible, which is what the current behavior 
does.

Also an interesting thing to note is that the node still replies to gossip syn 
messages with a gossip ack, but because we only update the FD on a 
version/generation change, and because LoadBroadcaster is also broken the node 
has no reason to generate new versions, it remains seen as down to the other 
nodes.  If LB did happen to work, we'd see the node flap every 90 seconds.

 Cassandra cluster stops responding on time change (scheduling not using 
 monotonic time?) 
 -

 Key: CASSANDRA-4066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4066
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Linux; CentOS6 2.6.32-220.4.2.el6.x86_64
Reporter: David Daeschler
Assignee: Brandon Williams
Priority: Minor
  Labels: gossip
 Fix For: 1.1.1


 The server installation I set up did not have ntpd installed in the base 
 installation. When I noticed that the clocks were skewing I installed ntp and 
 set the date on all the servers in the cluster. A short time later, I started 
 getting UnavailableExceptions on the clients. 
 Also, one sever seemed to be unaffected by the time change. That server 
 happened to have it's time pushed forward, not backwards like the other 3 in 
 the cluster. This leads me to believe something is running on a 
 timer/schedule that is not monotonic.
 I'm posting this as a bug, but I suppose it might just be part of the 
 communication protocols etc for the cluster and part of the design. But I 
 think the devs should be aware of what I saw.
 Otherwise, thank you for a fantastic product. Even after restarting 75% of 
 the cluster things seem to have recovered nicely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4058) Debian package does not create /var/lib/cassandra/data

2012-03-16 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231731#comment-13231731
 ] 

Brandon Williams commented on CASSANDRA-4058:
-

My point is that the 'cassandra' user's home directory is /var/lib/cassandra, 
so it should already own it when the package creates this user.  If you ahead 
of time create this directory with the wrong permissions, that is your mistake 
to correct, not the package's to solve with brute force.

 Debian package does not create /var/lib/cassandra/data
 --

 Key: CASSANDRA-4058
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4058
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
 Environment: Ubuntu 11.10
Reporter: Jacob Fenwick

 I installed Cassandra using the Debian packages as described here: 
 http://wiki.apache.org/cassandra/DebianPackaging
 When trying to start Cassandra using /etc/init.d/cassandra start I get this 
 error: java.io.IOError: java.io.IOException: unable to mkdirs 
 /var/lib/cassandra/data
 The directory /var/lib/cassandra exists, but the directory 
 /var/lib/cassandra/data does not.
 I would assume the data directory should have been created with the correct 
 permissions, but it was not.
 However, I tried creating /var/lib/cassandra/data and setting it the 
 permissions to 666 and setting the user/group to cassandra/cassandra, and now 
 I get this error:
 java.lang.AssertionError: Directory /var/lib/cassandra/data is not accessible.
 So what could possibly be the problem here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3811) Empty rpc_address prevents running MapReduce job outside a cluster

2012-03-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230424#comment-13230424
 ] 

Brandon Williams commented on CASSANDRA-3811:
-

It's an edge case because most people run hadoop colocated with cassandra.  
Why?  Because hadoop is about moving computation to data, not the other way 
around, and without colocation this is exactly what you're doing.

That said, we understand this is a problem that needs to be addressed, but it 
is hardly critical.

 Empty rpc_address prevents running MapReduce job outside a cluster
 --

 Key: CASSANDRA-3811
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3811
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.9, 0.8.10
 Environment: Debian Stable,
 Cassandra 0.8.9,
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03),
 Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Reporter: Patrik Modesto
Priority: Minor

 Setting rpc_address to empty to make Cassandra listen on all network 
 intefaceces breaks running mapredude job from outside the cluster. The jobs 
 wont even start, showing these messages:
 {noformat}
 12/01/26 11:15:21 DEBUG  hadoop.ColumnFamilyInputFormat: failed
 connect to endpoint 0.0.0.0
 java.io.IOException: unable to connect to server
at 
 org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:389)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:224)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:73)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:193)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:178)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.thrift.transport.TTransportException:
 java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at 
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at 
 org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:385)
... 9 more
 Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 11 more
 ...
 Caused by: java.util.concurrent.ExecutionException:
 java.io.IOException: failed connecting to all endpoints
 10.0.18.129,10.0.18.99,10.0.18.98
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:156)
... 19 more
 Caused by: java.io.IOException: failed connecting to all endpoints
 10.0.18.129,10.0.18.99,10.0.18.98
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:241)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:73)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:193)
at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:178)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 {noformat}
 Describe ring retunrs:
 {noformat}
 describe_ring returns:
 endpoints: 10.0.18.129,10.0.18.99,10.0.18.98
 rpc_endpoints: 0.0.0.0,0.0.0.0,0.0.0.0
 {noformat}
 [Michael 
 

[jira] [Commented] (CASSANDRA-3229) Remove ability to disable dynamic snitch entirely

2012-03-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230465#comment-13230465
 ] 

Brandon Williams commented on CASSANDRA-3229:
-

You can use the badness threshold from CASSANDRA-1519

 Remove ability to disable dynamic snitch entirely
 -

 Key: CASSANDRA-3229
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3229
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0.0

 Attachments: 3229.txt


 We've moved dynamic snitch from new, default to off to well tested, 
 default to true, and it's time now to take the next step to there is no 
 reason to disable it, and keeping the option around just lets people shoot 
 their foot off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4020) System time suddenly changed made gossip working abnormally

2012-03-14 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229401#comment-13229401
 ] 

Brandon Williams commented on CASSANDRA-4020:
-

Are you able to reproduce this consistently?  If so, can you give a list of 
explicit steps including when to start/stop nodes that can be performed to 
reproduce the issue?  

 System time suddenly changed  made gossip working abnormally 
 -

 Key: CASSANDRA-4020
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4020
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.6
Reporter: MaHaiyang

 I hava four cassandra node (A,B,C,D) .
  I changed node A's system time to one hour ahead  and change the time to 
 normal after serval seconds.Then I use nodetool's ring command at node B , 
 node B look node A is down . It's the same thing on node C and D . But node 
 A look itself is UP  by ring command .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

2012-03-14 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229855#comment-13229855
 ] 

Brandon Williams commented on CASSANDRA-4051:
-

It looks like we could extract/rebase the streaming changes from 
CASSANDRA-3112's first patch to solve this well enough for the bulk loader and 
BOF.

 Stream sessions can only fail via the FailureDetector
 -

 Key: CASSANDRA-4051
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1


 If for some reason, FileStreamTask itself fails more than the number of retry 
 attempts but gossip continues to work, the stream session will never be 
 closed.  This is unlikely to happen in practice since it requires blocking 
 the storage port from new connections but keeping the existing ones, however 
 for the bulk loader this is especially problematic since it doesn't have 
 access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4026) EC2 snitch incorrectly reports regions

2012-03-12 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227733#comment-13227733
 ] 

Brandon Williams commented on CASSANDRA-4026:
-

bq. hence after recovery they will not be able to be recovered via repair

One replica will always be in the right spot so you can repair.

bq. BTW: The attached patch can break after we AWS has 24 AZ's which is highly 
unlikely but i will create a ticket requesting for API for Regions instead of 
AZ.

That would be great.  Unfortunately when we have that, we'll still have to 
munge the name (and rack names) to be backwards compatible :(

This ticket makes me sad, but +1.

 EC2 snitch incorrectly reports regions
 --

 Key: CASSANDRA-4026
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4026
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.8
 Environment: Ubuntu 10.10 64 bit Oracle Java 6
Reporter: Todd Nine
Assignee: Vijay
 Attachments: 0001-CASSANDRA-4026.patch


 Currently the org.apache.cassandra.locator.Ec2Snitch reports us-west in 
 both the oregon and the california data centers.  This is incorrect, since 
 they are different regions.
 California = us-west-1
 Oregon = us-west-2
 wget http://169.254.169.254/latest/meta-data/placement/availability-zone 
 returns the value us-west-2a
 After parsing this returns
 DC = us-west Rack = 2a
 What it should return
 DC = us-west-2 Rack = a
 This makes it possible to use multi region when both regions are in the west 
 coast.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3991) Investigate importance of jsvc in debian packages

2012-03-12 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228203#comment-13228203
 ] 

Brandon Williams commented on CASSANDRA-3991:
-

bq. But if the main goal is to restart after a crash

I don't think that is a main goal.  Certainly OOMing in a loop, with heap dumps 
enabled, is not a thing we really ought to be doing.  Thanks for the JSW 
pointer, though!

 Investigate importance of jsvc in debian packages
 -

 Key: CASSANDRA-3991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3991
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.1.1


 jsvc seems to be buggy at best.  For instance, if you set a small heap like 
 128M it seems to completely ignore this and use as much memory as it wants.  
 I don't know what this is buying us over launching /usr/bin/cassandra 
 directly like the redhat scripts do, but I've seen multiple complaints about 
 its memory usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4021) CFS.scrubDataDirectories tries to delete nonexistent orphans

2012-03-11 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227276#comment-13227276
 ] 

Brandon Williams commented on CASSANDRA-4021:
-

I'm not sure how this happened, I tried to repro artificially and wasn't able 
to.  Originally what happened is I was testing a patch that threw a TON of 
errors (all time was spent in logging), and after a ctrl-c and restart this 
happened.

Is it really important to confirm the deletion here?  Being unable to start 
rather sucks.

 CFS.scrubDataDirectories tries to delete nonexistent orphans
 

 Key: CASSANDRA-4021
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4021
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7 beta 2
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.11, 1.0.9

 Attachments: 4021.txt


 The check only looks for a missing data file, then deletes all other 
 components, however it's possible for the data file and another component to 
 be missing, causing an error:
 {noformat}
  WARN 17:19:28,765 Removing orphans for 
 /var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-24492:
  [Index.db, Filter.db, Digest.sha1, Statistics.db, Data.db]
 ERROR 17:19:28,766 Exception encountered during startup
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 Exception encountered during startup: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3676) Add snaptree dependency to maven central and update pom

2012-03-09 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226285#comment-13226285
 ] 

Brandon Williams commented on CASSANDRA-3676:
-

http://search.maven.org/#artifactdetails%7Ccom.yammer.metrics%7Cmetrics-core%7C2.0.3%7Cjar

 Add snaptree dependency to maven central and update pom
 ---

 Key: CASSANDRA-3676
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3676
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Stephen Connolly
 Fix For: 1.1.0


 Snaptree dependency needs to be added to maven before we can release 1.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4020) System time suddenly changed made gossip working abnormally

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225245#comment-13225245
 ] 

Brandon Williams commented on CASSANDRA-4020:
-

I can't repro either setting one clock forward an hour, or backward an entire 
year.  Which makes sense, since the gossip generation is stored in the system 
table and reused.

 System time suddenly changed  made gossip working abnormally 
 -

 Key: CASSANDRA-4020
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4020
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.6
Reporter: MaHaiyang

 I hava four cassandra node (A,B,C,D) .
  I changed node A's system time to one hour ahead, then I use nodetool's ring 
 command at node B , node B look node A is down . It's the same thing on 
 node C and D . But node A look itself is UP  by ring command .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4019) java.util.ConcurrentModificationException in Gossiper

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225255#comment-13225255
 ] 

Brandon Williams commented on CASSANDRA-4019:
-

It seems like the Right Way to solve this is to refactor BoundedStatsDeque to 
operate like the dsnitch's AdapativeLatencyTracker, which *is* threadsafe, and 
then have ALT extend BSD.  BSD isn't used anywhere else except the FD.

 java.util.ConcurrentModificationException in Gossiper
 -

 Key: CASSANDRA-4019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4019
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.9
Reporter: Thibaut
Priority: Minor
 Fix For: 0.8.11


 I have never seen this one before. Might be triggered by a race condition 
 under heavy load. This error was triggered on 0.8.9
 ERROR [GossipTasks:1] 2012-03-05 04:16:55,263 Gossiper.java (line 162) Gossip 
 error
 java.util.ConcurrentModificationException
 at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:605)
 at 
 org.apache.cassandra.utils.AbstractStatsDeque.sum(AbstractStatsDeque.java:37)
 at 
 org.apache.cassandra.utils.AbstractStatsDeque.mean(AbstractStatsDeque.java:60)
 at 
 org.apache.cassandra.gms.ArrivalWindow.mean(FailureDetector.java:259)
 at 
 org.apache.cassandra.gms.ArrivalWindow.phi(FailureDetector.java:282)
 at 
 org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:155)
 at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
 at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
 at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
  INFO [GossipStage:1] 2012-03-05 04:16:55,263 Gossiper.java (line 737) Node 
 /192.168.3.18 has restarted, now UP again
  INFO [GossipStage:1] 2012-03-05 04:16:55,264 Gossiper.java (line 705) 
 InetAddress /192.168.3.18 is now UP

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4022) Compaction of hints can get stuck in a loop

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225350#comment-13225350
 ] 

Brandon Williams commented on CASSANDRA-4022:
-

Yuki mentions that it may be caused by CASSANDRA-3442 too.

 Compaction of hints can get stuck in a loop
 ---

 Key: CASSANDRA-4022
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4022
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Critical
 Fix For: 1.1.0


 Not exactly sure how I caused this as I was working on something else in 
 trunk, but:
 {noformat}
  INFO 17:41:35,682 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-339-Data.db')]
  INFO 17:41:36,430 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.912220MB/s.  Time: 748ms.
  INFO 17:41:36,431 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db')]
  INFO 17:41:37,238 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.479976MB/s.  Time: 807ms.
  INFO 17:41:37,239 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db')]
  INFO 17:41:38,163 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.786083MB/s.  Time: 924ms.
  INFO 17:41:38,164 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db')]
  INFO 17:41:39,014 GC for ParNew: 274 ms for 1 collections, 541261288 used; 
 max is 1024458752
  INFO 17:41:39,151 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.485132MB/s.  Time: 986ms.
  INFO 17:41:39,151 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db')]
  INFO 17:41:40,016 GC for ParNew: 308 ms for 1 collections, 585582200 used; 
 max is 1024458752
  INFO 17:41:40,200 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.223821MB/s.  Time: 1,047ms.
  INFO 17:41:40,201 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db')]
  INFO 17:41:41,017 GC for ParNew: 252 ms for 1 collections, 617877904 used; 
 max is 1024458752
  INFO 17:41:41,178 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.526449MB/s.  Time: 977ms.
  INFO 17:41:41,179 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db')]
  INFO 17:41:41,885 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 6.263938MB/s.  Time: 706ms.
  INFO 17:41:41,887 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db')]
  INFO 17:41:42,617 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 6.066311MB/s.  Time: 729ms.
  INFO 17:41:42,618 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db')]
  INFO 17:41:43,376 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 5.834222MB/s.  Time: 758ms.
  INFO 17:41:43,377 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db')]
  INFO 17:41:44,307 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-349-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 4.760323MB/s.  Time: 929ms.
  INFO 17:41:44,308 Compacting 
 

[jira] [Commented] (CASSANDRA-4022) Compaction of hints can get stuck in a loop

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225352#comment-13225352
 ] 

Brandon Williams commented on CASSANDRA-4022:
-

I should note that the machine does not hand anything off, so everything in 
these sstables must be tombstones.

 Compaction of hints can get stuck in a loop
 ---

 Key: CASSANDRA-4022
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4022
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Critical
 Fix For: 1.1.0


 Not exactly sure how I caused this as I was working on something else in 
 trunk, but:
 {noformat}
  INFO 17:41:35,682 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-339-Data.db')]
  INFO 17:41:36,430 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.912220MB/s.  Time: 748ms.
  INFO 17:41:36,431 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db')]
  INFO 17:41:37,238 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.479976MB/s.  Time: 807ms.
  INFO 17:41:37,239 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db')]
  INFO 17:41:38,163 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.786083MB/s.  Time: 924ms.
  INFO 17:41:38,164 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db')]
  INFO 17:41:39,014 GC for ParNew: 274 ms for 1 collections, 541261288 used; 
 max is 1024458752
  INFO 17:41:39,151 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.485132MB/s.  Time: 986ms.
  INFO 17:41:39,151 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db')]
  INFO 17:41:40,016 GC for ParNew: 308 ms for 1 collections, 585582200 used; 
 max is 1024458752
  INFO 17:41:40,200 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.223821MB/s.  Time: 1,047ms.
  INFO 17:41:40,201 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db')]
  INFO 17:41:41,017 GC for ParNew: 252 ms for 1 collections, 617877904 used; 
 max is 1024458752
  INFO 17:41:41,178 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.526449MB/s.  Time: 977ms.
  INFO 17:41:41,179 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db')]
  INFO 17:41:41,885 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 6.263938MB/s.  Time: 706ms.
  INFO 17:41:41,887 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db')]
  INFO 17:41:42,617 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 6.066311MB/s.  Time: 729ms.
  INFO 17:41:42,618 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db')]
  INFO 17:41:43,376 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 5.834222MB/s.  Time: 758ms.
  INFO 17:41:43,377 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db')]
  INFO 17:41:44,307 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-349-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 4.760323MB/s.  Time: 929ms.
  INFO 17:41:44,308 Compacting 
 

[jira] [Commented] (CASSANDRA-4023) Batch reading BloomFilters on startup

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225393#comment-13225393
 ] 

Brandon Williams commented on CASSANDRA-4023:
-

The theory here being that the multithreadedness is causing seek contention 
when loading the sstables.

 Batch reading BloomFilters on startup
 -

 Key: CASSANDRA-4023
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4023
 Project: Cassandra
  Issue Type: Improvement
Reporter: Joaquin Casares
  Labels: datastax_qa

 The difference of startup times between a 0.8.7 cluster and 1.0.7 cluster 
 with the same amount of data is 4x greater in 1.0.7.
 It seems as though 1.0.7 loads the BloomFilter through a series of reading 
 longs out in a multithreaded process while 0.8.7 reads the entire object.
 Perhaps we should update the new BloomFilter to do reading in batch as well?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4022) Compaction of hints can get stuck in a loop

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225463#comment-13225463
 ] 

Brandon Williams commented on CASSANDRA-4022:
-

What is happening very reproducibly now is that I started the node, and 5 
minutes later the forced compaction check in ACS kicks off, and then I have 
looping compaction on the hints but it's only compacting the last sstable over 
and over.

 Compaction of hints can get stuck in a loop
 ---

 Key: CASSANDRA-4022
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4022
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Critical
 Fix For: 1.1.0


 Not exactly sure how I caused this as I was working on something else in 
 trunk, but:
 {noformat}
  INFO 17:41:35,682 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-339-Data.db')]
  INFO 17:41:36,430 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.912220MB/s.  Time: 748ms.
  INFO 17:41:36,431 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-340-Data.db')]
  INFO 17:41:37,238 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 5.479976MB/s.  Time: 807ms.
  INFO 17:41:37,239 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-341-Data.db')]
  INFO 17:41:38,163 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.786083MB/s.  Time: 924ms.
  INFO 17:41:38,164 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-342-Data.db')]
  INFO 17:41:39,014 GC for ParNew: 274 ms for 1 collections, 541261288 used; 
 max is 1024458752
  INFO 17:41:39,151 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.485132MB/s.  Time: 986ms.
  INFO 17:41:39,151 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-343-Data.db')]
  INFO 17:41:40,016 GC for ParNew: 308 ms for 1 collections, 585582200 used; 
 max is 1024458752
  INFO 17:41:40,200 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.223821MB/s.  Time: 1,047ms.
  INFO 17:41:40,201 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-344-Data.db')]
  INFO 17:41:41,017 GC for ParNew: 252 ms for 1 collections, 617877904 used; 
 max is 1024458752
  INFO 17:41:41,178 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 4.526449MB/s.  Time: 977ms.
  INFO 17:41:41,179 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-345-Data.db')]
  INFO 17:41:41,885 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes 
 for 1 keys at 6.263938MB/s.  Time: 706ms.
  INFO 17:41:41,887 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-346-Data.db')]
  INFO 17:41:42,617 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 6.066311MB/s.  Time: 729ms.
  INFO 17:41:42,618 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-347-Data.db')]
  INFO 17:41:43,376 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db,].
   4,637,160 to 4,637,160 (~100% of original) bytes for 1 keys at 
 5.834222MB/s.  Time: 758ms.
  INFO 17:41:43,377 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-348-Data.db')]
  INFO 17:41:44,307 Compacted to 
 [/var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-349-Data.db,].
   4,637,160 to 4,637,160 (~100% of 

[jira] [Commented] (CASSANDRA-3555) Bootstrapping to handle more failure

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225607#comment-13225607
 ] 

Brandon Williams commented on CASSANDRA-3555:
-

Why comment out the info message in SS?

 Bootstrapping to handle more failure
 

 Key: CASSANDRA-3555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3555
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.5
Reporter: Vijay
Assignee: Vijay
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-3555-v1.patch, 
 3555-bootstrap-with-down-node-test.txt, 3555-bootstrap-with-down-node.txt


 We might want to handle failures in bootstrapping:
 1) When none of the Seeds are available to communicate then throw exception
 2) When any one of the node which it is bootstrapping fails then try next in 
 the list (and if the list is exhausted then throw exception).
 3) Clean all the existing files in the data Dir before starting just in case 
 we retry.
 4) Currently when one node is down in the cluster the bootstrapping will 
 fail, because the bootstrapping node doesnt understand which one is actually 
 down.
 Also print the nt ring in the logs so we can troubleshoot later if it fails.
 Currently if any one of the above happens the node is skipping the bootstrap 
 or hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3555) Bootstrapping to handle more failure

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225615#comment-13225615
 ] 

Brandon Williams commented on CASSANDRA-3555:
-

+1 otherwise

 Bootstrapping to handle more failure
 

 Key: CASSANDRA-3555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3555
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.5
Reporter: Vijay
Assignee: Vijay
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-3555-v1.patch, 
 3555-bootstrap-with-down-node-test.txt, 3555-bootstrap-with-down-node.txt


 We might want to handle failures in bootstrapping:
 1) When none of the Seeds are available to communicate then throw exception
 2) When any one of the node which it is bootstrapping fails then try next in 
 the list (and if the list is exhausted then throw exception).
 3) Clean all the existing files in the data Dir before starting just in case 
 we retry.
 4) Currently when one node is down in the cluster the bootstrapping will 
 fail, because the bootstrapping node doesnt understand which one is actually 
 down.
 Also print the nt ring in the logs so we can troubleshoot later if it fails.
 Currently if any one of the above happens the node is skipping the bootstrap 
 or hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4023) Batch reading BloomFilters on startup

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225621#comment-13225621
 ] 

Brandon Williams commented on CASSANDRA-4023:
-

bq. (Maybe it's time to add random vs sequential speed ratio as a setting, 
which at least is general enough to be useful in other places.)

This sounds like a good idea, we're never going to strike a balance that's 
sufficient between SSD and rotational media without a knob to turn.

 Batch reading BloomFilters on startup
 -

 Key: CASSANDRA-4023
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4023
 Project: Cassandra
  Issue Type: Improvement
Reporter: Joaquin Casares
  Labels: datastax_qa

 The difference of startup times between a 0.8.7 cluster and 1.0.7 cluster 
 with the same amount of data is 4x greater in 1.0.7.
 It seems as though 1.0.7 loads the BloomFilter through a series of reading 
 longs out in a multithreaded process while 0.8.7 reads the entire object.
 Perhaps we should update the new BloomFilter to do reading in batch as well?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4021) CFS.scrubDataDirectories tries to delete nonexistent orphans

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225671#comment-13225671
 ] 

Brandon Williams commented on CASSANDRA-4021:
-

Aren't we actually firing this when the data does not exist, though?

{code}
if (components.contains(Component.DATA)  dataFile.length()  0)
// everything appears to be in order... moving on.
continue;

// missing the DATA file! all components are orphaned
{code}

 CFS.scrubDataDirectories tries to delete nonexistent orphans
 

 Key: CASSANDRA-4021
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4021
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7 beta 2
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.11, 1.0.9

 Attachments: 4021.txt


 The check only looks for a missing data file, then deletes all other 
 components, however it's possible for the data file and another component to 
 be missing, causing an error:
 {noformat}
  WARN 17:19:28,765 Removing orphans for 
 /var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-24492:
  [Index.db, Filter.db, Digest.sha1, Statistics.db, Data.db]
 ERROR 17:19:28,766 Exception encountered during startup
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 Exception encountered during startup: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4026) EC2 snitch incorrectly reports regions

2012-03-08 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225711#comment-13225711
 ] 

Brandon Williams commented on CASSANDRA-4026:
-

bq. If we change EC2Snitch/Ec2Multiregion snitch, we also need to change the 
schema for the existing cluster

Not the schema per se, but the datacenter name.  This is doable though if 
you're willing to repair afterwards.  Another option is to switch an entire 
DC's snitch at a time.

bq. Option 1: Leave the existing snitch as it is and add a new snitch.

Ugh, that will cause tremendous confusing for new users.  It would however be 
nice to get rid of this wart at some point.

bq. Option 2: Parse for us-west-1 as us-west and parse us-west-2 as us-west2, 
as us-west-2 is fairly new it wont affect a lot of us?

There aren't a lot good options here, I'm not sure how I feel about this one 
since it's definitely a hack, but only appending the number to the DC if  1 
might be the least painful for existing users.

 EC2 snitch incorrectly reports regions
 --

 Key: CASSANDRA-4026
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4026
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.8
 Environment: Ubuntu 10.10 64 bit Oracle Java 6
Reporter: Todd Nine
Assignee: Vijay

 Currently the org.apache.cassandra.locator.Ec2Snitch reports us-west in 
 both the oregon and the california data centers.  This is incorrect, since 
 they are different regions.
 California = us-west-1
 Oregon = us-west-2
 wget http://169.254.169.254/latest/meta-data/placement/availability-zone 
 returns the value us-west-2a
 After parsing this returns
 DC = us-west Rack = 2a
 What it should return
 DC = us-west-2 Rack = a
 This makes it possible to use multi region when both regions are in the west 
 coast.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3671) provide JMX counters for unavailables/timeouts for reads and writes

2012-03-07 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224553#comment-13224553
 ] 

Brandon Williams commented on CASSANDRA-3671:
-

Oops, I thought it wasn't.  But really it should not have been since 1.1.0 is 
frozen.

 provide JMX counters for unavailables/timeouts for reads and writes
 ---

 Key: CASSANDRA-3671
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3671
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.1.0

 Attachments: CASSANDRA-3671-trunk-coda-metrics-203-withjar.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v1.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v2.txt, CASSANDRA-3671-trunk-v2.txt, 
 CASSANDRA-3671-trunk.txt, v1-0001-CASSANDRA-3671-trunk-coda-metrics-v2.txt.txt


 Attaching patch against trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

2012-03-07 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224622#comment-13224622
 ] 

Brandon Williams commented on CASSANDRA-3883:
-

Optimally, we'd have a way to express I'm at this column offset in this row, 
give me the next X number of columns, even if it requires going to the next 
row.  But I'm not sure how to do that sanely, either.  I know Jake is using a 
special CFIF for hive to handle wide rows that basically just grabs one row at 
a time and paginates it, which is fine if all the rows are wide, but will take 
a performance hit if they are not.  Still, that might be the best thing to do 
since using get_page_slices is currently so hairy.

 CFIF WideRowIterator only returns batch size columns
 

 Key: CASSANDRA-3883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Brandon Williams
 Fix For: 1.1.0

 Attachments: 3883-v1.txt


 Most evident with the word count, where there are 1250 'word1' items in two 
 rows (1000 in one, 250 in another) and it counts 198 with the batch size set 
 to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3880) Random Partitioner does not check if tokens are outside of its range

2012-03-07 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224648#comment-13224648
 ] 

Brandon Williams commented on CASSANDRA-3880:
-

A token greater than 2^127 is invalid.

 Random Partitioner does not check if tokens are outside of its range
 

 Key: CASSANDRA-3880
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3880
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 1.0.7
Reporter: Marcel Steinbach
Assignee: Harish Doddi
Priority: Minor

 Setting up a ring where the tokens are outside RP's token range leads to an 
 unbalanced cluster. The partitioner still reports equally distributed 
 ownership since it calculates ownership only with the _distances_ of the 
 tokens in relation to the maximum token. 
 E.g. maximum token = 15
 token1 = 5
 token2 = 10
 token3 = 15
 token4 = 20
 ownership4 = (token4 - token3) / maximum_token = 5 / 15 = 1/3
 So token4 claims to own 33.33% of the ring but is not responsible for any 
 primary replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3991) Investigate importance of jsvc in debian packages

2012-03-06 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223937#comment-13223937
 ] 

Brandon Williams commented on CASSANDRA-3991:
-

bq. If jsvc is buggy (is this memory thing the only problem?)

I've also heard of output.log getting huge, as you noted on irc we can 
logrotate that problem away.

bq. Try to properly daemonize entirely from shell (I tried doing this with 
bin/cassandra FWIW, I don't think it's practical)

Out of curiosity, what was the problem?  I'm doing this on a machine right now 
(by removing jsvc from the init) and it hasn't been a problem (though I'll 
admit I'm running it as root, heh)

bq. Looking at the source, jsvc seems pretty simple. I might be willing to take 
a crack at bug-fixing in the weeks to come assuming a) I knew how to reproduce 
the issue(s), and b) everyone doesn't already have their hearts set on #4.

I don't see anything pressing enough to jump straight to 4 yet, but if that 
changes it's the easiest option to implement so I'm willing to wait and try to 
do things the Right Way.

 Investigate importance of jsvc in debian packages
 -

 Key: CASSANDRA-3991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3991
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1


 jsvc seems to be buggy at best.  For instance, if you set a small heap like 
 128M it seems to completely ignore this and use as much memory as it wants.  
 I don't know what this is buying us over launching /usr/bin/cassandra 
 directly like the redhat scripts do, but I've seen multiple complaints about 
 its memory usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3671) provide JMX counters for unavailables/timeouts for reads and writes

2012-03-05 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222549#comment-13222549
 ] 

Brandon Williams commented on CASSANDRA-3671:
-

I don't see any reason this can't go in 1.1

 provide JMX counters for unavailables/timeouts for reads and writes
 ---

 Key: CASSANDRA-3671
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3671
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-3671-trunk-coda-metrics-203-withjar.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v1.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v2.txt, CASSANDRA-3671-trunk-v2.txt, 
 CASSANDRA-3671-trunk.txt, v1-0001-CASSANDRA-3671-trunk-coda-metrics-v2.txt.txt


 Attaching patch against trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3992) Add setloglevel command to Cli to call setLog4jLevel JMX method

2012-03-03 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221637#comment-13221637
 ] 

Brandon Williams commented on CASSANDRA-3992:
-

I can see how this would seem handy for development, but really it's less 
powerful than changing log4j-server.properties since it's all or nothing - you 
can't set a specific class/package to DEBUG, for instance.  I haven't tested 
this in a while, but if you edit the log4j file Cassandra should re-read it 
after a few seconds and enact the changes.

 Add setloglevel command to Cli to call setLog4jLevel JMX method
 ---

 Key: CASSANDRA-3992
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3992
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.0.0
Reporter: Maki Watanabe
Priority: Minor
 Attachments: 0001-Add-setloglevel-command.patch


 Add setloglevel command to Cli which call setLog4jLevel method.
 Syntax:
   setloglevel class level
 Supported levels are TRACE, DEBUG, INFO, WARN, ERROR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3991) Investigate importance of jsvc in debian packages

2012-03-03 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221765#comment-13221765
 ] 

Brandon Williams commented on CASSANDRA-3991:
-

bq. Are we sure it actually restart the process? I could be wrong but last time 
I checked, jsvc was only restarting a crashed process if the exit code was 123 
(or something like that)

I know from experience if you set the heap to a small value like 64M and use 
jsvc on a machine with a small amount of memory (like 256M) it will a) use more 
memory than the system has, b) get killed by the oom killer, and c) restart the 
cycle again.

 Investigate importance of jsvc in debian packages
 -

 Key: CASSANDRA-3991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3991
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1


 jsvc seems to be buggy at best.  For instance, if you set a small heap like 
 128M it seems to completely ignore this and use as much memory as it wants.  
 I don't know what this is buying us over launching /usr/bin/cassandra 
 directly like the redhat scripts do, but I've seen multiple complaints about 
 its memory usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3988) NullPointerException in org.apache.cassandra.service.AntiEntropyService when repair finds a keyspace with no CFs

2012-03-02 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221084#comment-13221084
 ] 

Brandon Williams commented on CASSANDRA-3988:
-

We assert this can't happen:

{code}
java.lang.AssertionError: Repairing no column families seems pointless, doesn't 
it
{code}

So I'm not sure how we get into this situation, but apparently we can.

 NullPointerException in org.apache.cassandra.service.AntiEntropyService when 
 repair finds a keyspace with no CFs
 

 Key: CASSANDRA-3988
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3988
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Bill Hathaway
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0.9

 Attachments: 3988.txt


 2012-03-01 21:38:09,039 [RMI TCP Connection(142)-10.253.106.21] INFO  
 StorageService - Starting repair command #15, repairing 3 ranges.
 2012-03-01 21:38:09,039 [AntiEntropySessions:14] INFO  AntiEntropyService - 
 [repair #d68369f0-63e6-11e1--8add8b9398fd] new session: will sync 
 /10.253.106.21, /10.253.106.248, /10.253.106.247 on range 
 (85070591730234615865843651857942052864,106338239662793269832304564822427566080]
  for PersonalizationDataService2.[]
 2012-03-01 21:38:09,039 [AntiEntropySessions:14] ERROR 
 AbstractCassandraDaemon - Fatal exception in thread 
 Thread[AntiEntropySessions:14,5,RMI Runtime]
 java.lang.NullPointerException
 at 
 org.apache.cassandra.service.AntiEntropyService$RepairSession.runMayThrow(AntiEntropyService.java:691)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3991) Investigate importance of jsvc in debian packages

2012-03-02 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221252#comment-13221252
 ] 

Brandon Williams commented on CASSANDRA-3991:
-

One thing jsvc does is restart a crashed process -- but there are other, better 
ways of accomplishing this.

 Investigate importance of jsvc in debian packages
 -

 Key: CASSANDRA-3991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3991
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.1.1


 jsvc seems to be buggy at best.  For instance, if you set a small heap like 
 128M it seems to completely ignore this and use as much memory as it wants.  
 I don't know what this is buying us over launching /usr/bin/cassandra 
 directly like the redhat scripts do, but I've seen multiple complaints about 
 its memory usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3671) provide JMX counters for unavailables/timeouts for reads and writes

2012-03-01 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220035#comment-13220035
 ] 

Brandon Williams commented on CASSANDRA-3671:
-

+1

 provide JMX counters for unavailables/timeouts for reads and writes
 ---

 Key: CASSANDRA-3671
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3671
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-3671-trunk-coda-metrics-203-withjar.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v1.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v2.txt, CASSANDRA-3671-trunk-v2.txt, 
 CASSANDRA-3671-trunk.txt


 Attaching patch against trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3555) Bootstrapping to handle more failure

2012-03-01 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220102#comment-13220102
 ] 

Brandon Williams commented on CASSANDRA-3555:
-

I'm +1 on the posted patches with the minor nit of instead using a single if 
statement with an or clause.  Are we still intending to do 2 and 3?

 Bootstrapping to handle more failure
 

 Key: CASSANDRA-3555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3555
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.5
Reporter: Vijay
Assignee: Vijay
 Fix For: 1.2

 Attachments: 3555-bootstrap-with-down-node-test.txt, 
 3555-bootstrap-with-down-node.txt


 We might want to handle failures in bootstrapping:
 1) When none of the Seeds are available to communicate then throw exception
 2) When any one of the node which it is bootstrapping fails then try next in 
 the list (and if the list is exhausted then throw exception).
 3) Clean all the existing files in the data Dir before starting just in case 
 we retry.
 4) Currently when one node is down in the cluster the bootstrapping will 
 fail, because the bootstrapping node doesnt understand which one is actually 
 down.
 Also print the nt ring in the logs so we can troubleshoot later if it fails.
 Currently if any one of the above happens the node is skipping the bootstrap 
 or hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3980) Cli should be able to define CompositeType comparators

2012-02-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219257#comment-13219257
 ] 

Brandon Williams commented on CASSANDRA-3980:
-

If we can, we should add an example in the help.  I asked Pavel before creating 
this though and he told me to create it :)

 Cli should be able to define CompositeType comparators
 --

 Key: CASSANDRA-3980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3980
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.0.9


 There is currently no way to define, for instance, 
 CompositeType(UTF8Type,Int32Type) in a CF definition.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3980) Cli should be able to define CompositeType comparators

2012-02-29 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219378#comment-13219378
 ] 

Brandon Williams commented on CASSANDRA-3980:
-

+1

 Cli should be able to define CompositeType comparators
 --

 Key: CASSANDRA-3980
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3980
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Pavel Yaskevich
 Fix For: 1.0.9

 Attachments: CASSANDRA-3980.patch


 There is currently no way to define, for instance, 
 CompositeType(UTF8Type,Int32Type) in a CF definition.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3972) HintedHandoff fails to deliver any hints

2012-02-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218143#comment-13218143
 ] 

Brandon Williams commented on CASSANDRA-3972:
-

At least one problem here is that CFS.removeDeleted is removing everything here 
(though listing the hints family in the cli works correctly):

{code}
ColumnFamily hintsPage = 
ColumnFamilyStore.removeDeleted(hintStore.getColumnFamily(filter), 
Integer.MAX_VALUE);
{code}

At DEBUG you can see the columns collected, but then hintsPage is null.  If we 
simply change this to:

{code}
ColumnFamily hintsPage = hintStore.getColumnFamily(filter);
{code}

Then at least some of the hints get sent.

 HintedHandoff fails to deliver any hints
 

 Key: CASSANDRA-3972
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3972
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Blocker
  Labels: hintedhandoff
 Fix For: 1.1.0


 Summary says it all.  Whether in a memtable or sstable, no hints are 
 delivered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3859) Add Progress Reporting to Cassandra OutputFormats

2012-02-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218193#comment-13218193
 ] 

Brandon Williams commented on CASSANDRA-3859:
-

bq. It means with patch progress reporting is working but it is not reporting 
progress after every second while loading(can you explain this?). Because same 
patch throws timeout exception for 10 seconds of time out.

My guess is 10s is too artificial an amount of time and the JVM is still 
forking, or warming up, or something.

 Add Progress Reporting to Cassandra OutputFormats
 -

 Key: CASSANDRA-3859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3859
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop, Tools
Affects Versions: 1.1.0
Reporter: Samarth Gahire
Assignee: Brandon Williams
Priority: Minor
  Labels: bulkloader, hadoop, mapreduce, sstableloader
 Fix For: 1.1.0

 Attachments: 0001-add-progress-reporting-to-BOF.txt, 
 0002-Add-progress-to-CFOF.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 When we are using the BulkOutputFormat to load the data to cassandra. We 
 should use the progress reporting to Hadoop Job within Sstable loader because 
 while loading the data for particular task if streaming is taking more time 
 and progress is not reported to Job it may kill the task with timeout 
 exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

2012-02-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218564#comment-13218564
 ] 

Brandon Williams commented on CASSANDRA-2261:
-

+1

 During Compaction, Corrupt SSTables with rows that cause failures should be 
 identified and blacklisted.
 ---

 Key: CASSANDRA-2261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: not_a_pony
 Fix For: 1.1.1

 Attachments: 2261-v2.patch, 2261.patch, CASSANDRA-2261-v3.patch


 When a compaction of a set of SSTables fails because of corruption it will 
 continue to try to compact that SSTable causing pending compactions to build 
 up.
 One way to mitigate this problem would be to log the error, then identify the 
 specific SSTable that caused the failure, subsequently blacklisting that 
 SSTable and ensuring that it is no longer included in future compactions. For 
 this we could simply store the problematic SSTable's name in memory.
 If it's not possible to identify the SSTable that caused the issue, then 
 perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
 together is something that can be done to solve this problem in a more 
 general case, and avoid issues where two (or more) SSTables have trouble 
 compacting a particular row. For this option we would probably want to store 
 the lists of the bad combinations in the system table somewhere s.t. these 
 can survive a node failure (there have been a few cases where I have seen a 
 compaction cause a node failure).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3975) Hints Should Be Dropped When Missing CFid Implies Deleted ColumnFamily

2012-02-28 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218763#comment-13218763
 ] 

Brandon Williams commented on CASSANDRA-3975:
-

+1

 Hints Should Be Dropped When Missing CFid Implies Deleted ColumnFamily
 --

 Key: CASSANDRA-3975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3975
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Chris Herron
Assignee: Jonathan Ellis
  Labels: datastax_qa
 Fix For: 1.0.9, 1.1.0


 If hints have accumulated for a CF that has been deleted, Hinted Handoff 
 repeatedly fails until manual intervention removes those hints. For 1.0.7, 
 UnserializableColumnFamilyException is thrown only when a CFid is unknown on 
 the sending node. As discussed on #cassandra-dev, if the schema is in 
 agreement, the affected hint(s) should be deleted to avoid indefinite repeat 
 failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217235#comment-13217235
 ] 

Brandon Williams commented on CASSANDRA-3294:
-

bq. How about we assign probability to be alive to each of the nodes in the 
ring

This sounds like reinventing the existing failure detector to me.

 a node whose TCP connection is not up should be considered down for the 
 purpose of reads and writes
 ---

 Key: CASSANDRA-3294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

 Cassandra fails to handle the most simple of cases intelligently - a process 
 gets killed and the TCP connection dies. I cannot see a good reason to wait 
 for a bunch of RPC timeouts and thousands of hung requests to realize that we 
 shouldn't be sending messages to a node when the only possible means of 
 communication is confirmed down. This is why one has to disablegossip and 
 wait for a while to restar a node on a busy cluster (especially without 
 CASSANDRA-2540 but that only helps under certain circumstances).
 A more generalized approach where by one e.g. weights in the number of 
 currently outstanding RPC requests to a node, would likely take care of this 
 case as well. But until such a thing exists and works well, it seems prudent 
 to have the very common and controlled form of failure be handled better.
 Are there difficulties I'm not seeing?
 I can see that one may want to distinguish between considering something 
 really down (and e.g. fail a repair because it's down) from what I'm 
 talking about, so maybe there are different concepts (say one is currently 
 unreachable rather than down) being conflated. But in the specific case of 
 sending reads/writes to a node we *know* we cannot talk to, it seems 
 unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217243#comment-13217243
 ] 

Brandon Williams commented on CASSANDRA-3294:
-

One thing that occurs to me here is that the FD is sort of a one-way device: we 
can send it hints that something is alive, but we can't send it hints that 
something is dead.  Thus, the only way a node can be marked down is by its phi 
decaying over time.  If we added the ability to negatively affect the phi 
directly (TCP connection isn't present, or has been refused, etc) this could 
speed failure detection up considerably.

 a node whose TCP connection is not up should be considered down for the 
 purpose of reads and writes
 ---

 Key: CASSANDRA-3294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

 Cassandra fails to handle the most simple of cases intelligently - a process 
 gets killed and the TCP connection dies. I cannot see a good reason to wait 
 for a bunch of RPC timeouts and thousands of hung requests to realize that we 
 shouldn't be sending messages to a node when the only possible means of 
 communication is confirmed down. This is why one has to disablegossip and 
 wait for a while to restar a node on a busy cluster (especially without 
 CASSANDRA-2540 but that only helps under certain circumstances).
 A more generalized approach where by one e.g. weights in the number of 
 currently outstanding RPC requests to a node, would likely take care of this 
 case as well. But until such a thing exists and works well, it seems prudent 
 to have the very common and controlled form of failure be handled better.
 Are there difficulties I'm not seeing?
 I can see that one may want to distinguish between considering something 
 really down (and e.g. fail a repair because it's down) from what I'm 
 talking about, so maybe there are different concepts (say one is currently 
 unreachable rather than down) being conflated. But in the specific case of 
 sending reads/writes to a node we *know* we cannot talk to, it seems 
 unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217261#comment-13217261
 ] 

Brandon Williams commented on CASSANDRA-3294:
-

I see.  We can do that by sorting on the current phi scores, but we'd need to 
respect the badness threshold for those doing replica pinning.  Sounds like 
we're starting to bump up against CASSANDRA-3722 here.

 a node whose TCP connection is not up should be considered down for the 
 purpose of reads and writes
 ---

 Key: CASSANDRA-3294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

 Cassandra fails to handle the most simple of cases intelligently - a process 
 gets killed and the TCP connection dies. I cannot see a good reason to wait 
 for a bunch of RPC timeouts and thousands of hung requests to realize that we 
 shouldn't be sending messages to a node when the only possible means of 
 communication is confirmed down. This is why one has to disablegossip and 
 wait for a while to restar a node on a busy cluster (especially without 
 CASSANDRA-2540 but that only helps under certain circumstances).
 A more generalized approach where by one e.g. weights in the number of 
 currently outstanding RPC requests to a node, would likely take care of this 
 case as well. But until such a thing exists and works well, it seems prudent 
 to have the very common and controlled form of failure be handled better.
 Are there difficulties I'm not seeing?
 I can see that one may want to distinguish between considering something 
 really down (and e.g. fail a repair because it's down) from what I'm 
 talking about, so maybe there are different concepts (say one is currently 
 unreachable rather than down) being conflated. But in the specific case of 
 sending reads/writes to a node we *know* we cannot talk to, it seems 
 unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3958) Remove random HH delay

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217320#comment-13217320
 ] 

Brandon Williams commented on CASSANDRA-3958:
-

bq. large hint loads (which are the ones that matter most) are going to overlap 
anyway even with the maximum 60s difference

True, but isn't it better to have some entropy at the start and ramp up to hint 
overload, rather than have all the machines attempt at once and cause it more 
quickly?

 Remove random HH delay
 --

 Key: CASSANDRA-3958
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3958
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
  Labels: hintedhandoff
 Fix For: 1.1.0

 Attachments: 3958.txt


 {code}
 .   // sleep a random amount to stagger handoff delivery from different 
 replicas.
 // (if we had to wait, then gossiper randomness took care of that for 
 us already.)
 if (waited == 0)
 {
 // use a 'rounded' sleep interval because of a strange bug with 
 windows: CASSANDRA-3375
 int sleep = FBUtilities.threadLocalRandom().nextInt(2000) * 30;
 logger_.debug(Sleeping {}ms to stagger hint delivery, sleep);
 Thread.sleep(sleep);
 }
 {code}
 This is obsolete now that we have the per-hint configurable delay.  And large 
 hint loads (which are the ones that matter most) are going to overlap anyway 
 even with the maximum 60s difference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3958) Remove random HH delay

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217333#comment-13217333
 ] 

Brandon Williams commented on CASSANDRA-3958:
-

wfm, +1

 Remove random HH delay
 --

 Key: CASSANDRA-3958
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3958
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
  Labels: hintedhandoff
 Fix For: 1.1.0

 Attachments: 3958.txt


 {code}
 .   // sleep a random amount to stagger handoff delivery from different 
 replicas.
 // (if we had to wait, then gossiper randomness took care of that for 
 us already.)
 if (waited == 0)
 {
 // use a 'rounded' sleep interval because of a strange bug with 
 windows: CASSANDRA-3375
 int sleep = FBUtilities.threadLocalRandom().nextInt(2000) * 30;
 logger_.debug(Sleeping {}ms to stagger hint delivery, sleep);
 Thread.sleep(sleep);
 }
 {code}
 This is obsolete now that we have the per-hint configurable delay.  And large 
 hint loads (which are the ones that matter most) are going to overlap anyway 
 even with the maximum 60s difference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3671) provide JMX counters for unavailables/timeouts for reads and writes

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217346#comment-13217346
 ] 

Brandon Williams commented on CASSANDRA-3671:
-

bq. It does create a split-world syndrome of new style vs. old style 
metrics though. I'd love input.

I'm +1 on this approach, because we need to keep things backwards-compatible 
for existing monitoring solutions, but ultimately I'm unhappy with the current 
state of JMX and starting fresh like you have here is a good way to solve it.  
Fleshing out o.a.c.metrics looks like the future to me.

 provide JMX counters for unavailables/timeouts for reads and writes
 ---

 Key: CASSANDRA-3671
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3671
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-3671-trunk-coda-metrics-v1.txt, 
 CASSANDRA-3671-trunk-coda-metrics-v2.txt, CASSANDRA-3671-trunk-v2.txt, 
 CASSANDRA-3671-trunk.txt


 Attaching patch against trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3294) a node whose TCP connection is not up should be considered down for the purpose of reads and writes

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217437#comment-13217437
 ] 

Brandon Williams commented on CASSANDRA-3294:
-

bq. I think CASSANDRA-3722's original premise doesn't address the concerns I 
see in real life (I don't want special cases trying to communicate X is 
happening), but towards the end I start agreeing with the ticket more.

I agree; the original premise there was jumping the gun with a solution a bit, 
but I think ultimately we end up in very similar places.

 a node whose TCP connection is not up should be considered down for the 
 purpose of reads and writes
 ---

 Key: CASSANDRA-3294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller

 Cassandra fails to handle the most simple of cases intelligently - a process 
 gets killed and the TCP connection dies. I cannot see a good reason to wait 
 for a bunch of RPC timeouts and thousands of hung requests to realize that we 
 shouldn't be sending messages to a node when the only possible means of 
 communication is confirmed down. This is why one has to disablegossip and 
 wait for a while to restar a node on a busy cluster (especially without 
 CASSANDRA-2540 but that only helps under certain circumstances).
 A more generalized approach where by one e.g. weights in the number of 
 currently outstanding RPC requests to a node, would likely take care of this 
 case as well. But until such a thing exists and works well, it seems prudent 
 to have the very common and controlled form of failure be handled better.
 Are there difficulties I'm not seeing?
 I can see that one may want to distinguish between considering something 
 really down (and e.g. fail a repair because it's down) from what I'm 
 talking about, so maybe there are different concepts (say one is currently 
 unreachable rather than down) being conflated. But in the specific case of 
 sending reads/writes to a node we *know* we cannot talk to, it seems 
 unnecessarily detrimental.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3797) StorageProxy static initialization not triggered until thrift requests come in

2012-02-27 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217580#comment-13217580
 ] 

Brandon Williams commented on CASSANDRA-3797:
-

I tested this with CASSANDRA-3671 and everything worked.

 StorageProxy static initialization not triggered until thrift requests come in
 --

 Key: CASSANDRA-3797
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3797
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.1.0

 Attachments: 3797-forname.txt, CASSANDRA-3797-trunk-v1.txt


 While plugging in the metrics library for CASSANDRA-3671 I realized (because 
 the metrics library was trying to add a shutdown hook on metric creation) 
 that starting cassandra and simply shutting it down, causes StorageProxy to 
 not be initialized until the drain shutdown hook.
 Effects:
 * StorageProxy mbean missing in visualvm/jconsole after initial startup 
 (seriously, I thought I was going nuts ;))
 * And in general anything that makes assumptions about running early, or at 
 least not during JVM shutdown, such as the metrics library, will be 
 problematic

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3962) CassandraStorage can't cast fields from a CF correctly

2012-02-26 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216836#comment-13216836
 ] 

Brandon Williams commented on CASSANDRA-3962:
-

I think implementing LoadCaster will fix this, but it's strange to me that pig 
doesn't allow going to the other way, casting a chararray to a bytearray since 
that's the only thing guaranteed to work here, in case the Bytes CF has keys 
that won't map to UTF8.

 CassandraStorage can't cast fields from a CF correctly
 --

 Key: CASSANDRA-3962
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3962
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.0.8
 Environment: OSX 10.6.latest, Pig 0.9.2.
Reporter: Janne Jalkanen
Assignee: Brandon Williams
  Labels: hadoop, pig
 Attachments: test.cli, test.pig


 Included scripts demonstrate the problem.  Regardless of whether the key is 
 cast as a chararray or not, the Pig scripts fail with 
 {code}
 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be 
 cast to java.lang.String
   at 
 org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:72)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:117)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3859) Add Progress Reporting to Cassandra OutputFormats

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215670#comment-13215670
 ] 

Brandon Williams commented on CASSANDRA-3859:
-

Correct.

 Add Progress Reporting to Cassandra OutputFormats
 -

 Key: CASSANDRA-3859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3859
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop, Tools
Affects Versions: 1.1.0
Reporter: Samarth Gahire
Assignee: Brandon Williams
Priority: Minor
  Labels: bulkloader, hadoop, mapreduce, sstableloader
 Fix For: 1.1.0

 Attachments: 0001-add-progress-reporting-to-BOF.txt, 
 0002-Add-progress-to-CFOF.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 When we are using the BulkOutputFormat to load the data to cassandra. We 
 should use the progress reporting to Hadoop Job within Sstable loader because 
 while loading the data for particular task if streaming is taking more time 
 and progress is not reported to Job it may kill the task with timeout 
 exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3935) Hints not delivered possibly because of pagination issue

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215674#comment-13215674
 ] 

Brandon Williams commented on CASSANDRA-3935:
-

bq. after that periodically logs that Finished hinted handoff of 0 rows to 
endpoint ... 

This happens when there is one sstable for hints, since it won't currently 
compact it away and tombstones remain inside it.

 Hints not delivered possibly because of pagination issue
 

 Key: CASSANDRA-3935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3935
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: B. Todd Burruss

 I'm testing hinted handoff in 1.1 beta1 and cannot seem to get a hint 
 delivered.  3 node cluster, RF = 3, writing with CL = ONE.  killed a host 
 then did the write using the CLI on another node.  I can see hint waiting 
 using CLI and I see the log messages at the end of this email.  this suggests 
 the hints exist bu are not being delivered (and I'll see the log messages 
 over and over.)
 I did see tracing with debugger and see that in 
 HintedHandoffManager.deliverHintsToEndpointInternal, this line will remove 
 the hint because of the Integer.MAX_VALUE
 ColumnFamily hintsPage = 
 ColumnFamilyStore.removeDeleted(hintStore.getColumnFamily(filter), 
 Integer.MAX_VALUE);
 I'm not sure I quite understand while MAX is used when the same remove is 
 done in getColumnFamily(filter).  regardless if it is useful or not, it 
 prevents the hints from delivery.
 any thoughts?
 [default@unknown] use system;
 Authenticated to keyspace: system
 [default@system] list hintscolumnfamily;
 Using default limit of 100
 ---
 RowKey: 00
 = (super_column=493ecfa05c1411e10da23097c7ff,
  (column=6b6579, value=6b35, timestamp=132999580, ttl=86400)
  (column=6d75746174696f6e, 
 value=000662746f64646200026b35000103e80103e87fff80010002633504b96d055fd13c68696e746564207772697465,
  timestamp=132999579, ttl=86400)
  (column=7461626c65, value=62746f646462, timestamp=132999580, 
 ttl=86400)
  (column=76657273696f6e, value=0004, timestamp=132999580, 
 ttl=86400))
 1 Row Returned.
 Elapsed time: 58 msec(s).
 INFO [HintedHandoff:1] 2012-02-20 14:44:53,811 HintedHandOffManager.java 
 (line 296) Started hinted handoff for token: 0 with IP: /192.168.56.1
 INFO [HintedHandoff:1] 2012-02-20 14:44:53,815 HintedHandOffManager.java 
 (line 373) Finished hinted handoff of 0 rows to endpoint /192.168.56.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3955) HintedHandoff won't compact a single sstable, resulting in repeated log messages

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215724#comment-13215724
 ] 

Brandon Williams commented on CASSANDRA-3955:
-

+1

 HintedHandoff won't compact a single sstable, resulting in repeated log 
 messages
 

 Key: CASSANDRA-3955
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3955
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.0.9

 Attachments: 3955-v2.txt, 3955.txt


 First introduced by CASSANDRA-3554, and then mostly solved in CASSANDRA-3733, 
 there is still one special case where the HH log message will repeat every 10 
 mins for 0 rows: when there have previously been hints delivered to the node, 
 but now only a single sstable exists.  Because the we refused to compact a 
 single sstable, and it contains tombstones for the hints, the message repeats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3829) make seeds *only* be seeds, not special in gossip

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215763#comment-13215763
 ] 

Brandon Williams commented on CASSANDRA-3829:
-

If I understand correctly, we're going to reduce RING_DELAY as follows:

* gossip a full round to every seed, sleep one extra gossip interval (1s)
* announcing the pending range setup to each seed, sleep one extra gossip 
interval

Step 1 is to learn about all nodes in the ring, and make sure they know about 
us.  Step 2 is roughly the same, but with the pending range announced.  The 
catch here, however, is that we're exploiting the 'seed optimization' (meaning 
that all other nodes will have gossiped with one of the seeds during the gossip 
interval we slept for) which means that seed list homogeneity is now even more 
important than before; if any node has a differing list we can't guarantee that 
it saw our updates in this time frame.

 make seeds *only* be seeds, not special in gossip 
 --

 Key: CASSANDRA-3829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3829
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor

 First, a little bit of framing on how seeds work:
 The concept of seed hosts makes fundamental sense; you need to
 seed a new node with some information required in order to join a
 cluster. Seed hosts is the information Cassandra uses for this
 purpose.
 But seed hosts play a role even after the initial start-up of a new
 node in a ring. Specifically, seed hosts continue to be gossiped to
 separately by the Gossiper throughout the life of a node and the
 cluster.
 Generally, operators must be careful to ensure that all nodes in a
 cluster are appropriately configured to refer to an overlapping set of
 seed hosts. Strictly speaking this should not be necessary (see
 further down though), but is the general recommendation. An
 unfortunate side-effect of this is that whenever you are doing ring
 management, such as replacing nodes, removing nodes, etc, you have to
 keep in mind which nodes are seeds.
 For example, if you bring a new node into the cluster, doing
 everything right with token assignment and auto_bootstrap=true, it
 will just enter the cluster without bootstrap - causing inconsistent
 reads. This is dangerous.
 And worse - changing the notion of which nodes are seeds across a
 cluster requires a *rolling restart*. It can be argued that it should
 actually be okay for nodes other than the one being fiddled with to
 incorrectly treat the fiddled-with node as a seed node, but this fact
 is highly opaque to most users that are not intimately familiar with
 Cassandra internals.
 This adds additional complexity to operations, as it introduces a
 reason why you cannot view the ring as completely homogeneous, despite
 the fundamental idea of Cassandra that all nodes should be equal.
 Now, fast forward a bit to what we are doing over here to avoid this
 problem: We have a zookeeper based systems for keeping track of hosts
 in a cluster, which is used by our Cassandra client to discover nodes
 to talk to. This works well.
 In order to avoid the need to manually keep track of seeds, we wanted
 to make seeds be automatically discoverable in order to eliminate as
 an operational concern. We have implemented a seed provider that does
 this for us, based on the data we keep in zookeeper.
 We could see essentially three ways of plugging this in:
 * (1) We could simply rely on not needing overlapping seeds and grab whatever 
 we have when a node starts.
 * (2) We could do something like continually treat all other nodes as seeds 
 by dynamically changing the seed list (involves some other changes like 
 having the Gossiper update it's notion of seeds.
 * (3) We could completely eliminate the use of seeds *except* for the very 
 specific purpose of initial start-up of an unbootstrapped node, and keep 
 using a static (for the duration of the node's uptime) seed list.
 (3) was attractive because it felt like this was the original intent
 of seeds; that they be used for *seeding*, and not be constantly
 required during cluster operation once nodes are already joined.
 Now before I make the suggestion, let me explain how we are currently
 (though not yet in production) handling seeds and start-up.
 First, we have the following relevant cases to consider during a normal 
 start-up:
 * (a) we are starting up a cluster for the very first time
 * (b) we are starting up a new clean node in order to join it to a 
 pre-existing cluster
 * (c) we are starting up a pre-existing already joined node in a pre-existing 
 cluster
 First, we proceeded on the assumption that we wanted to remove the use
 of 

[jira] [Commented] (CASSANDRA-3706) Back up configuration files on startup

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215774#comment-13215774
 ] 

Brandon Williams commented on CASSANDRA-3706:
-

bq. As Jonathan points out, it makes no sense to save this information in the 
system keyspace then, as the point of this storage is for easing disaster 
recovery, and if the node goes down, the yaml as depicted in the keyspace is no 
more likely to survive than the yaml file in the conf directory. If this 
feature is to have value at all, it must be replicated, so a secondary 
quasi-system keyspace that is replicated would be needed.

I'm not convinced this is necessarily true.  If each node only stores its own 
config and you backup a snapshot of the node, you can restore it.  If all data 
is lost and you have nothing to restore from, another node having a copy of the 
dead node's config is hardly useful; configs only diverge by initial_token in 
practice, and copying the config files directly from another node and changing 
the initial_token is more practical than extracting the files from a CF and 
then copying them (there is no way for the 'restored' node to do this 
automatically without a config to start with.) 

 Back up configuration files on startup
 --

 Key: CASSANDRA-3706
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3706
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: save_configuration.diff, save_configuration_2.diff, 
 save_configuration_3.diff, save_configuration_4.diff, 
 save_configuration_6.diff, save_configuration_7.diff


 Snapshot can backup user data, but it's also nice to be able to have 
 known-good configurations saved as well in case of accidental snafus or even 
 catastrophic loss of a cluster.  If we check for changes to cassandra.yaml, 
 cassandra-env.sh, and maybe log4j-server.properties on startup, we can back 
 them up to a columnfamily that can then be handled by normal snapshot/backup 
 procedures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3671) provide JMX counters for unavailables/timeouts for reads and writes

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215791#comment-13215791
 ] 

Brandon Williams commented on CASSANDRA-3671:
-

Can you rebase?

 provide JMX counters for unavailables/timeouts for reads and writes
 ---

 Key: CASSANDRA-3671
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3671
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-3671-trunk-coda-metrics-v1.txt, 
 CASSANDRA-3671-trunk-v2.txt, CASSANDRA-3671-trunk.txt


 Attaching patch against trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3649) Code style changes, aka The Big Reformat

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215838#comment-13215838
 ] 

Brandon Williams commented on CASSANDRA-3649:
-

Wow.  LGTM so far.

 Code style changes, aka The Big Reformat
 

 Key: CASSANDRA-3649
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3649
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Brandon Williams
 Fix For: 1.2


 With a new major release coming soon and not having a ton of huge pending 
 patches that have prevented us from doing this in the past, post-freeze looks 
 like a good time to finally do this.  Mostly this will include the removal of 
 underscores in private variables, and no more brace-on-newline policy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3829) make seeds *only* be seeds, not special in gossip

2012-02-24 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216123#comment-13216123
 ] 

Brandon Williams commented on CASSANDRA-3829:
-

bq. That sounds reasonable. (And would imply wait a couple seconds between 
bootstrapping nodes, right?)

Right, plus some padding for timer skew and processing on the seeds.

 make seeds *only* be seeds, not special in gossip 
 --

 Key: CASSANDRA-3829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3829
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor

 First, a little bit of framing on how seeds work:
 The concept of seed hosts makes fundamental sense; you need to
 seed a new node with some information required in order to join a
 cluster. Seed hosts is the information Cassandra uses for this
 purpose.
 But seed hosts play a role even after the initial start-up of a new
 node in a ring. Specifically, seed hosts continue to be gossiped to
 separately by the Gossiper throughout the life of a node and the
 cluster.
 Generally, operators must be careful to ensure that all nodes in a
 cluster are appropriately configured to refer to an overlapping set of
 seed hosts. Strictly speaking this should not be necessary (see
 further down though), but is the general recommendation. An
 unfortunate side-effect of this is that whenever you are doing ring
 management, such as replacing nodes, removing nodes, etc, you have to
 keep in mind which nodes are seeds.
 For example, if you bring a new node into the cluster, doing
 everything right with token assignment and auto_bootstrap=true, it
 will just enter the cluster without bootstrap - causing inconsistent
 reads. This is dangerous.
 And worse - changing the notion of which nodes are seeds across a
 cluster requires a *rolling restart*. It can be argued that it should
 actually be okay for nodes other than the one being fiddled with to
 incorrectly treat the fiddled-with node as a seed node, but this fact
 is highly opaque to most users that are not intimately familiar with
 Cassandra internals.
 This adds additional complexity to operations, as it introduces a
 reason why you cannot view the ring as completely homogeneous, despite
 the fundamental idea of Cassandra that all nodes should be equal.
 Now, fast forward a bit to what we are doing over here to avoid this
 problem: We have a zookeeper based systems for keeping track of hosts
 in a cluster, which is used by our Cassandra client to discover nodes
 to talk to. This works well.
 In order to avoid the need to manually keep track of seeds, we wanted
 to make seeds be automatically discoverable in order to eliminate as
 an operational concern. We have implemented a seed provider that does
 this for us, based on the data we keep in zookeeper.
 We could see essentially three ways of plugging this in:
 * (1) We could simply rely on not needing overlapping seeds and grab whatever 
 we have when a node starts.
 * (2) We could do something like continually treat all other nodes as seeds 
 by dynamically changing the seed list (involves some other changes like 
 having the Gossiper update it's notion of seeds.
 * (3) We could completely eliminate the use of seeds *except* for the very 
 specific purpose of initial start-up of an unbootstrapped node, and keep 
 using a static (for the duration of the node's uptime) seed list.
 (3) was attractive because it felt like this was the original intent
 of seeds; that they be used for *seeding*, and not be constantly
 required during cluster operation once nodes are already joined.
 Now before I make the suggestion, let me explain how we are currently
 (though not yet in production) handling seeds and start-up.
 First, we have the following relevant cases to consider during a normal 
 start-up:
 * (a) we are starting up a cluster for the very first time
 * (b) we are starting up a new clean node in order to join it to a 
 pre-existing cluster
 * (c) we are starting up a pre-existing already joined node in a pre-existing 
 cluster
 First, we proceeded on the assumption that we wanted to remove the use
 of seeds during regular gossip (other than on initial startup). This
 means that for the (c) case, we can *completely* ignore seeds. We
 never even have to discover the seed list, or if we do, we don't have
 to use them.
 This leaves (a) and (b). In both cases, the critical invariant we want
 to achieve is that we must have one or more *valid* seeds (valid means
 for (b) that the seed is in the cluster, and for (a) that it is one of
 the nodes that are part of the initial cluster setup).
 In the (c) case the problem is trivial - ignore seeds.
 In the (a) case, the 

[jira] [Commented] (CASSANDRA-3943) Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time.

2012-02-23 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214808#comment-13214808
 ] 

Brandon Williams commented on CASSANDRA-3943:
-

bq. As the no of nodes in cluster goes increasing, size of each sstable loaded 
to cassandra node decreases

This is unavoidable, since the sstable now contains more ranges that belong to 
specific replica ranges in the ring.

bq. Such small size sstables take too much time to compact (minor compaction)

Assuming SizeTieredStrategy, increasing the maximum threshold may help this to 
some degree, so that the nodes compact more tiny sstables at a time.

bq. Is there any solution to this in existing versions or are you fixing this 
in future version?

I'm open to ideas, but have no plans as there is no clear solution.

 Too many small size sstables after loading data using sstableloader or 
 BulkOutputFormat increases compaction time.
 --

 Key: CASSANDRA-3943
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3943
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop, Tools
Affects Versions: 0.8.2, 1.1.0
Reporter: Samarth Gahire
Assignee: Brandon Williams
Priority: Minor
  Labels: bulkloader, hadoop, sstableloader, streaming, tools
   Original Estimate: 168h
  Remaining Estimate: 168h

 When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the 
 size of sstables created is around the buffer size provided.
 But After loading , sstables created in the cluster nodes are of size around
 {code}( (sstable_size_before_loading) * replication_factor ) / 
 No_Of_Nodes_In_Cluster{code}
 As the no of nodes in cluster goes increasing, size of each sstable loaded to 
 cassandra node decreases.Such small size sstables take too much time to 
 compact (minor compaction) as compare to relatively large size sstables.
 One solution that we have tried is to increase the buffer size while 
 generating sstables.But as we increase the buffer size ,time taken to 
 generate sstables increases.Is there any solution to this in existing 
 versions or are you fixing this in future version?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3934) Short read protection is broken

2012-02-22 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213538#comment-13213538
 ] 

Brandon Williams commented on CASSANDRA-3934:
-

+1

 Short read protection is broken
 ---

 Key: CASSANDRA-3934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3934
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.2
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0.8, 1.1.0

 Attachments: 3934.txt


 When a read needs to do more than one retry (due to short reads), the 
 originalCount is not preserved by the retry leading to returning more than 
 the requested number of columns.
 Moreover, when a retried read checks whether more retry is needed, it doesn't 
 compare the number of live column retrieved against the original number of 
 columns requested by the user, but against the number of columns requested 
 during the retry, making it much more likely to actually do one more retry.
 This catch by the two tests 'short_read_test' and 'short_read_reversed_test' 
 at https://github.com/riptano/cassandra-dtest/blob/master/consistency_test.py 
 that are failing intermittently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3859) Add Progress Reporting to Cassandra OutputFormats

2012-02-22 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213550#comment-13213550
 ] 

Brandon Williams commented on CASSANDRA-3859:
-

bq. one thing I could think of, is if they are adding a lot of batches, we 
don't actually call progress until the loop is over

I'm not sure what you mean, we report the progress inside the loop over 
mutations in write()

 Add Progress Reporting to Cassandra OutputFormats
 -

 Key: CASSANDRA-3859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3859
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop, Tools
Affects Versions: 1.1.0
Reporter: Samarth Gahire
Assignee: Brandon Williams
Priority: Minor
  Labels: bulkloader, hadoop, mapreduce, sstableloader
 Fix For: 1.1.0

 Attachments: 0001-add-progress-reporting-to-BOF.txt, 
 0002-Add-progress-to-CFOF.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 When we are using the BulkOutputFormat to load the data to cassandra. We 
 should use the progress reporting to Hadoop Job within Sstable loader because 
 while loading the data for particular task if streaming is taking more time 
 and progress is not reported to Job it may kill the task with timeout 
 exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3931) gossipers notion of schema differs from reality as reported by the nodes in question

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212627#comment-13212627
 ] 

Brandon Williams commented on CASSANDRA-3931:
-

Hmm, does hinted handoff work in this state?  I ask because we've had this 
problem before and addressed it there:

{code}
waited = 0;
// then wait for the correct schema version.
// usually we use DD.getDefsVersion, which checks the local schema uuid 
as stored in the system table.
// here we check the one in gossip instead; this serves as a canary to 
warn us if we introduce a bug that
// causes the two to diverge (see CASSANDRA-2946)
{code}

 gossipers notion of schema differs from reality as reported by the nodes in 
 question
 

 Key: CASSANDRA-3931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3931
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Assignee: Peter Schuller
 Fix For: 1.1.0


 On a 1.1 cluster we happened to notice that {{nodetool gossipinfo | grep 
 SCHEMA}} reported disagreement:
 {code}
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:bcdbd318-82df-3518-89e3-6b72227b3f66
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
   SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f
 {code}
 However, the result of a thrift {{describe_ring}} on the cluster claims they 
 all agree and that {{b0d7bab7-c13c-37d9-9adb-8ab8a5b7215d}} is the schema 
 they have.
 The schemas seem to actually propagate; e.g. dropping a keyspace actually 
 drops the keyspace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212679#comment-13212679
 ] 

Brandon Williams commented on CASSANDRA-3843:
-

I'm unable to repro against 1.0 HEAD.

 Unnecessary  ReadRepair request during RangeScan
 

 Key: CASSANDRA-3843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Philip Andronov
Assignee: Jonathan Ellis
 Fix For: 1.0.8

 Attachments: 3843-v2.txt, 3843.txt


 During reading with Quorum level and replication factor greater then 2, 
 Cassandra sends at least one ReadRepair, even if there is no need to do that. 
 With the fact that read requests await until ReadRepair will finish it slows 
 down requsts a lot, up to the Timeout :(
 It seems that the problem has been introduced by the CASSANDRA-2494, 
 unfortunately I have no enought knowledge of Cassandra internals to fix the 
 problem and do not broke CASSANDRA-2494 functionality, so my report without a 
 patch.
 Code explanations:
 {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
 class RangeSliceResponseResolver {
 // 
 private class Reducer extends 
 MergeIterator.ReducerPairRow,InetAddress, Row
 {
 // 
 protected Row getReduced()
 {
 ColumnFamily resolved = versions.size()  1
   ? 
 RowRepairResolver.resolveSuperset(versions)
   : versions.get(0);
 if (versions.size()  sources.size())
 {
 for (InetAddress source : sources)
 {
 if (!versionSources.contains(source))
 {
   
 // [PA] Here we are adding null ColumnFamily.
 // later it will be compared with the desired
 // version and will give us fake difference which
 // forces Cassandra to send ReadRepair to a given 
 source
 versions.add(null);
 versionSources.add(source);
 }
 }
 }
 // 
 if (resolved != null)
 
 repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
 versions, versionSources));
 // 
 }
 }
 }
 {code}
 {code:title=RowRepairResolver.java|borderStyle=solid}
 public class RowRepairResolver extends AbstractRowResolver {
 // 
 public static ListIAsyncResult scheduleRepairs(ColumnFamily resolved, 
 String table, DecoratedKey? key, ListColumnFamily versions, 
 ListInetAddress endpoints)
 {
 ListIAsyncResult results = new 
 ArrayListIAsyncResult(versions.size());
 for (int i = 0; i  versions.size(); i++)
 {
 // On some iteration we have to compare null and resolved which 
 are obviously
 // not equals, so it will fire a ReadRequest, however it is not 
 needed here
 ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
 resolved);
 if (diffCf == null)
 continue;
 //  
 {code}
 Imagine the following situation:
 NodeA has X.1 // row X with the version 1
 NodeB has X.2 
 NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
 or 2
 During the Quorum read from nodes A and B, Cassandra creates version 12 and 
 send ReadRepair, so now nodes has the following content:
 NodeA has X.12
 NodeB has X.12
 which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
 is no need to do that, the next consistent read have a chance to be served by 
 nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
 will be fired and brings nodeC to the consistent state
 Right now we are reading from the Index a lot and starting from some point in 
 time we are getting TimeOutException because cluster is overloaded by the 
 ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3936) Gossip should have a 'goodbye' command to indicate shutdown

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212726#comment-13212726
 ] 

Brandon Williams commented on CASSANDRA-3936:
-

This could be helpful to mitigate the problem of CASSANDRA-2540 too.

 Gossip should have a 'goodbye' command to indicate shutdown
 ---

 Key: CASSANDRA-3936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3936
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.2


 Cassandra is crash-only, however there are times when you _know_ you are 
 taking the node down (rolling restarts, for instance) where it would be 
 advantageous to instantly have the node marked down rather than wait on the 
 FD.  We could also improve the efficacy of the 'disablegossip' command this 
 way as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3859) Add Progress Reporting to Cassandra OutputFormats

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212863#comment-13212863
 ] 

Brandon Williams commented on CASSANDRA-3859:
-

I'd be fine with adding counters, but I'd like to know why progress reporting 
isn't solving this.  I fear we may add counters and still not have a resolution 
without understanding why this isn't working.

 Add Progress Reporting to Cassandra OutputFormats
 -

 Key: CASSANDRA-3859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3859
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop, Tools
Affects Versions: 1.1.0
Reporter: Samarth Gahire
Assignee: Brandon Williams
Priority: Minor
  Labels: bulkloader, hadoop, mapreduce, sstableloader
 Fix For: 1.1.0

 Attachments: 0001-add-progress-reporting-to-BOF.txt, 
 0002-Add-progress-to-CFOF.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 When we are using the BulkOutputFormat to load the data to cassandra. We 
 should use the progress reporting to Hadoop Job within Sstable loader because 
 while loading the data for particular task if streaming is taking more time 
 and progress is not reported to Job it may kill the task with timeout 
 exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3939) occasional failure of CliTest

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212940#comment-13212940
 ] 

Brandon Williams commented on CASSANDRA-3939:
-

+1

 occasional failure of CliTest
 -

 Key: CASSANDRA-3939
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3939
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.0.7
Reporter: Eric Evans
Assignee: Eric Evans
Priority: Minor
 Fix For: 1.0.8

 Attachments: 
 v1-0001-CASSANDRA-3939-properly-initialize-CliSessionState.sch.txt


 {{CliTest}} will occasionally fail with an NPE.
 {noformat}
 [junit] Testcase: testCli(org.apache.cassandra.cli.CliTest):  Caused an ERROR
 [junit] java.lang.NullPointerException
 [junit] java.lang.RuntimeException: java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cli.CliClient.executeAddColumnFamily(CliClient.java:1039)
 [junit]   at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:228)
 [junit]   at 
 org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:213)
 [junit]   at org.apache.cassandra.cli.CliTest.testCli(CliTest.java:241)
 [junit] Caused by: java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cli.CliClient.validateSchemaIsSettled(CliClient.java:2855)
 [junit]   at 
 org.apache.cassandra.cli.CliClient.executeAddColumnFamily(CliClient.java:1030)
 {noformat}
 This occurs because no default for {{schema_mwt}} is applied unless 
 {{main()}} is invoked.
 (Trivial )patch to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3412) make nodetool ring ownership smarter

2012-02-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210296#comment-13210296
 ] 

Brandon Williams commented on CASSANDRA-3412:
-

When I have multiple keyspaces (one simple, one NTS) and I specify the NTS 
keyspace, it still gives me the warning and tells me to specify a keyspace, 
even though I did.

 make nodetool ring ownership smarter
 

 Key: CASSANDRA-3412
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3412
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jackson Chung
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-3412.patch


 just a thought.. the ownership info currently just look at the token and 
 calculate the % between nodes. It would be nice if it could do more, such as 
 discriminate nodes of each DC, replica set, etc. 
 ticket is open for suggestion...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3412) make nodetool ring ownership smarter

2012-02-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210480#comment-13210480
 ] 

Brandon Williams commented on CASSANDRA-3412:
-

Here are some simple steps to repro:

* run stress as a cheap way to create a simple ks
* create keyspace nts with placement_strategy 
'org.apache.cassandra.locator.NetworkTopologyStrategy' and 
strategy_options={DC1:2, DC2:2};


 make nodetool ring ownership smarter
 

 Key: CASSANDRA-3412
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3412
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jackson Chung
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-3412.patch


 just a thought.. the ownership info currently just look at the token and 
 calculate the % between nodes. It would be nice if it could do more, such as 
 discriminate nodes of each DC, replica set, etc. 
 ticket is open for suggestion...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3412) make nodetool ring ownership smarter

2012-02-17 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210603#comment-13210603
 ] 

Brandon Williams commented on CASSANDRA-3412:
-

+1, but let's consolidate the 'warning' output lines to:

Note: Ownership information does not include topology, please specify a keyspace

 make nodetool ring ownership smarter
 

 Key: CASSANDRA-3412
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3412
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jackson Chung
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-3412-v2.patch, 0001-CASSANDRA-3412.patch


 just a thought.. the ownership info currently just look at the token and 
 calculate the % between nodes. It would be nice if it could do more, such as 
 discriminate nodes of each DC, replica set, etc. 
 ticket is open for suggestion...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3910) make phi_convict_threshold Float

2012-02-16 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209563#comment-13209563
 ] 

Brandon Williams commented on CASSANDRA-3910:
-

bq. most messages sent between nodes timeouts withing 30 sec timeout limit

What 30s limit?

Why is a phi of 8 not sufficient here?  What fraction between 8 and 9 solves 
this problem?

 make phi_convict_threshold Float
 

 Key: CASSANDRA-3910
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3910
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.7
Reporter: Radim Kolar

 I would like to have phi_convict_threshold floating point number instead of 
 integer. Value 8 is too low for me and value 9 is too high. With converting 
 to floating point, it can be better fine tuned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3907) Support compression using BulkWriter

2012-02-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208419#comment-13208419
 ] 

Brandon Williams commented on CASSANDRA-3907:
-

Your understanding is correct.

 Support compression using BulkWriter
 

 Key: CASSANDRA-3907
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3907
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Chris Goffinet
Assignee: Chris Goffinet
 Fix For: 1.1.0

 Attachments: 0001-Add-compression-support-to-BulkWriter.patch


 Currently there is no way to enable compression using BulkWriter. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3915) Fix LazilyCompactedRowTest

2012-02-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208543#comment-13208543
 ] 

Brandon Williams commented on CASSANDRA-3915:
-

+1

 Fix LazilyCompactedRowTest
 --

 Key: CASSANDRA-3915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3915
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0.8, 1.1.0

 Attachments: 3915.patch


 LazilyCompactedRowTest.testTwoRowSuperColumn has never really worked. It uses 
 LazilyCompactedRowTest.assertBytes() that assumes standard columns, even 
 though that test is for super columns. For some reason, the deserialization 
 of the super columns as columns was not breaking stuff and so the test was 
 working, but CASSANDRA-3872 changed that and 
 LazilyCompactedRowTest.testTwoRowSuperColumn fails on current cassandra-1.1 
 branch (but it's not CASSANDRA-3872 fault, just the test that is buggy).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3677) NPE during HH delivery when gossip turned off on target

2012-02-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208580#comment-13208580
 ] 

Brandon Williams commented on CASSANDRA-3677:
-

It doesn't make a lot of sense.  a) we have no way to quickly find such hints, 
and b) if you removetoken the node, the data from existing replicas will be 
copied to restore the RF, so the hint isn't necessary (unless you wrote at ANY, 
in which case you've already lived dangerously.)

 NPE during HH delivery when gossip turned off on target
 ---

 Key: CASSANDRA-3677
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3677
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.7
Reporter: Radim Kolar
Assignee: Brandon Williams
Priority: Trivial
 Fix For: 1.0.8

 Attachments: 3677-v1.patch, 3677.txt


 probably not important bug
 ERROR [OptionalTasks:1] 2011-12-27 21:44:25,342 AbstractCassandraDaemon.java 
 (line 138) Fatal exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.NullPointerException
 at 
 org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113)
 at 
 org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553)
 at 
 org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348)
 at 
 org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319)
 at 
 org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32)
 at 
 org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:371)
 at 
 org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:356)
 at 
 org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84)
 at 
 org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:679)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3917) System test failures in 1.1

2012-02-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208604#comment-13208604
 ] 

Brandon Williams commented on CASSANDRA-3917:
-

+1

 System test failures in 1.1
 ---

 Key: CASSANDRA-3917
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3917
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 3917.txt


 On branch 1.1, I currently see two system test failures:
 {noformat}
 ==
 FAIL: 
 system.test_thrift_server.TestMutations.test_get_range_slice_after_deletion
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/mcmanus/Git/cassandra/test/system/test_thrift_server.py, line 
 1937, in test_get_range_slice_after_deletion
 assert len(result[0].columns) == 1
 AssertionError
 {noformat}
 and
 {noformat}
 ==
 FAIL: Test that column ttled expires from KEYS index
 --
 Traceback (most recent call last):
   File /usr/lib/pymodules/python2.7/nose/case.py, line 187, in runTest
 self.test(*self.arg)
   File /home/mcmanus/Git/cassandra/test/system/test_thrift_server.py, line 
 1908, in test_index_scan_expiring
 assert len(result) == 1, result
 AssertionError: []
 --
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   >