[Cassandra Wiki] Trivial Update of Habiba umar by Habiba umar

2011-08-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The Habiba umar page has been changed by Habiba umar:
http://wiki.apache.org/cassandra/Habiba%20umar

New page:
##master-page:HomepageTemplate
#format wiki
#language en
== @``ME@ ==

Email: MailTo(you AT SPAMFREE example DOT com)
## You can even more obfuscate your email address by adding more uppercase 
letters followed by a leading and trailing blank.

...


CategoryHomepage


[jira] [Commented] (CASSANDRA-2973) fatal errrors after nodetool cleanup

2011-08-16 Thread Wojciech Meler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085698#comment-13085698
 ] 

Wojciech Meler commented on CASSANDRA-2973:
---

I'm back. As MichaƂ said cluster history was: 0.7.0 - 0.7.2 - 0.7.3 - 0.7.4 
- 0.8.0 - 0.8.1.
It started with 6 nodes. After migrating to 0.8.0 cluster grew to 12, and after 
0.8.1 to 18 nodes.

It's hard to say which CF got read errors, but exceptions from scrub suggests 
that it was mta_logs which is plain CF.

 fatal errrors after nodetool cleanup
 

 Key: CASSANDRA-2973
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2973
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Wojciech Meler
Assignee: Sylvain Lebresne

 after adding nodes to cluster  running cleanup I get scaring exceptions in 
 log:
 2011-07-30 00:00:05:506 CEST ERROR 
 [ReadStage:2335][org.apache.cassandra.service.AbstractCassandraDaemon] Fatal 
 exception in thread Thread[ReadStage:2335,5,main]
 java.io.IOError: java.io.IOException: mmap segment underflow; remaining is 
 4394 but 60165 requested
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:80)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:91)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:67)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1292)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1189)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1146)
 at org.apache.cassandra.db.Table.getRow(Table.java:385)
 at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:61)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
  Caused by: java.io.IOException: mmap segment underflow; remaining is 4394 
 but 60165 requested
 at 
 org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:117)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:389)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:194)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.deserializeIndex(IndexHelper.java:83)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:73)
 ... 14 more
 exceptions disappeared after running scrub

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3037) Could not get input splits Caused by: java.io.IOException: failed connecting to all endpoints slave1/123.198.69.242

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-3037.
---

Resolution: Not A Problem

You need to figure out why the nodes can't connect to each other.  This looks 
like an environment or configuration problem, not a Cassandra bug.

 Could not get input splits Caused by: java.io.IOException: failed connecting 
 to all endpoints slave1/123.198.69.242
 ---

 Key: CASSANDRA-3037
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3037
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
 Environment: Ubuntu 10.04 LTS
 Hadoop from Cloudera 0.20.203
 Latest java
Reporter: Anton Vedeshin
Priority: Blocker

 After upgrade of cassandra from 0.8.2 to 0.8.4, got this error, before 
 upgrade everything was working fine
 I have restarted cassandra, removed data, etc. nothing helps 
 I have 6 identical machines in the cloud, before it was working fine. If I 
 make netstat then it shows port 9160 listening nodetool ... ring - responces 
 with 6 machines UP.
 Finally I have truncated all 6 servers and reinstalled hadoop + cassandra 
 0.8.4 from scratch.
 compiled and tried to execute word_cound
 receive the same error as after upgrade
 Error:
 11/08/15 15:23:54 INFO WordCount: output reducer type: cassandra
 11/08/15 15:23:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 Exception in thread main java.io.IOException: Could not get input splits
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
 at WordCount.run(Unknown Source)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at WordCount.main(Unknown Source)
 Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
 failed connecting to all endpoints slave1/154.198.69.242
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
 ... 7 more
 Caused by: java.io.IOException: failed connecting to all endpoints 
 slave1/154.198.69.242
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:234)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Paul Loy (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Loy updated CASSANDRA-2820:


Attachment: fast_bytearray_iostreams_harmony-patch-6.txt

Rebased to trunk.

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Paul Loy (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Loy updated CASSANDRA-2820:


Attachment: (was: fast_bytearray_iostreams_harmony-patch-6.txt)

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Paul Loy (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Loy updated CASSANDRA-2820:


Attachment: fast_bytearray_iostreams_harmony-patch-6.txt

rebased to trunk.

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2820:
--

Reviewer: brandon.williams  (was: slebresne)
Assignee: Paul Loy

I've asked Brandon to do some real-world performance testing too.

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Assignee: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Paul Loy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085759#comment-13085759
 ] 

Paul Loy commented on CASSANDRA-2820:
-

Awesome! Here's hoping it gives some discernible positive effect!

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Assignee: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Trivial Update of FAQ by jeremyhanna

2011-08-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The FAQ page has been changed by jeremyhanna:
http://wiki.apache.org/cassandra/FAQ?action=diffrev1=132rev2=133

Comment:
Made an exception class not a link.

  Anchor(dropped_messages)
  == Why do I see ... messages dropped.. in the logs? ==
  
- Internode messages which are received by a node, but do not get not to be 
processed within rpc_timeout are dropped rather than processed. As the 
coordinator node will no longer be waiting for a response. If the Coordinator 
node does not receive Consistency Level responses before the rpc_timeout it 
will return a TimedOutExcpetion to the client. If the coordinator receives 
Consistency Level responses it will return success to the client. 
+ Internode messages which are received by a node, but do not get not to be 
processed within rpc_timeout are dropped rather than processed. As the 
coordinator node will no longer be waiting for a response. If the Coordinator 
node does not receive Consistency Level responses before the rpc_timeout it 
will return a !TimedOutExcpetion to the client. If the coordinator receives 
Consistency Level responses it will return success to the client. 
  
  For MUTATION messages this means that the mutation was not applied to all 
replicas it was sent to. The inconsistency will be repaired by Read Repair or 
Anti Entropy Repair. 
  


[jira] [Updated] (CASSANDRA-3036) Vague primary key references in CQL

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3036:
--

Affects Version/s: 0.8.1
Fix Version/s: 0.8.5
 Assignee: Pavel Yaskevich

We should validate that column_metadata does not get created with the 
key_alias, or column insertions where the column name is the key_alias.  (For 
both CQL and old paths.)

 Vague primary key references in CQL
 ---

 Key: CASSANDRA-3036
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3036
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 0.8.1
Reporter: Kelley Reynolds
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: cql
 Fix For: 0.8.5


 create columnfamily wonk (id 'utf8' primary key, id int)
 update wonk set id=1 where id='test'
 create index wonk_id on wonk (id)
 This does what you would expect but then the results are unclear when using 
 'id' in a where clause.
 select * from wonk where id=1 returns nothing and select * from wonk where 
 id='test' works fine.
 Perhaps secondary indexes should not be allowed on columns that have the same 
 name as the key_alias? At least a warning or something should be thrown to 
 indicate you've just made a useless index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-08-16 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085812#comment-13085812
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


After crushing my head again this for a few days I can say that this is more 
complicated that it sounds for a few reasons:

 - We will need to support both old/new directory structures which requires 
major changes in the way how Descriptor class works and how CFS and SSTable 
classes do file lookup and path generation.
 - Adds additional complexity to the way how we do backups, snapshots and 
recover which could potentially lead to some nasty bugs.
 - As Peter already mentioned Cassandra won't be able to distinguish between 
an actual empty CF and a directory that wasn't mounted (or a symlink pointing 
to a non-mounted directory).

There are more but those mentioned are major ones. Let's skip this for now.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2749) fine-grained control over data directories

2011-08-16 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-2749:
---

Fix Version/s: (was: 1.0)
 Assignee: (was: Pavel Yaskevich)

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3037) Could not get input splits Caused by: java.io.IOException: failed connecting to all endpoints slave1/123.198.69.242

2011-08-16 Thread Anton Vedeshin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085820#comment-13085820
 ] 

Anton Vedeshin commented on CASSANDRA-3037:
---

Configuration is 1 to 1 as it was in the version 0.8.2 (First I tried to remain 
same cassandra.yaml on each node, then with the next installation default were 
installed, I have changed them accordingly).
ssh without login and password is working from master to all 5 slaves by both 
hostname and IP
ring shows that everything is up, etc.

how can I check the connection?

BTW, if I put 0.0.0.0 to rpc_server, then there is an error, If I put IP 
address there, also a problem, only hostname. ping to hostnames works, so 
everything is working

How could I check connection?

Thank you in advance!

 Could not get input splits Caused by: java.io.IOException: failed connecting 
 to all endpoints slave1/123.198.69.242
 ---

 Key: CASSANDRA-3037
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3037
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
 Environment: Ubuntu 10.04 LTS
 Hadoop from Cloudera 0.20.203
 Latest java
Reporter: Anton Vedeshin
Priority: Blocker

 After upgrade of cassandra from 0.8.2 to 0.8.4, got this error, before 
 upgrade everything was working fine
 I have restarted cassandra, removed data, etc. nothing helps 
 I have 6 identical machines in the cloud, before it was working fine. If I 
 make netstat then it shows port 9160 listening nodetool ... ring - responces 
 with 6 machines UP.
 Finally I have truncated all 6 servers and reinstalled hadoop + cassandra 
 0.8.4 from scratch.
 compiled and tried to execute word_cound
 receive the same error as after upgrade
 Error:
 11/08/15 15:23:54 INFO WordCount: output reducer type: cassandra
 11/08/15 15:23:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 Exception in thread main java.io.IOException: Could not get input splits
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
 at WordCount.run(Unknown Source)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at WordCount.main(Unknown Source)
 Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
 failed connecting to all endpoints slave1/154.198.69.242
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
 ... 7 more
 Caused by: java.io.IOException: failed connecting to all endpoints 
 slave1/154.198.69.242
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:234)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-08-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085830#comment-13085830
 ] 

Jonathan Ellis commented on CASSANDRA-2268:
---

Aaron, is this still on your radar?

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.5

 Attachments: 0001-2268-wip.patch


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Pavel Yaskevich (JIRA)
Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename 
and few other places responsible for work with SSTable files.


 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.5


ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
are doing - scanning whole table directory for files and directories, but it's 
mission is to locate CF specific files only.
Descriptor.fromFilename could be refactored to use getParentFile and getName 
methods instead of manual parsing of the path. Small refactorings in this sense 
are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3037) Could not get input splits Caused by: java.io.IOException: failed connecting to all endpoints slave1/123.198.69.242

2011-08-16 Thread Anton Vedeshin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085832#comment-13085832
 ] 

Anton Vedeshin commented on CASSANDRA-3037:
---

netstat is showing listening on all needed ports: 9160, 7199

 Could not get input splits Caused by: java.io.IOException: failed connecting 
 to all endpoints slave1/123.198.69.242
 ---

 Key: CASSANDRA-3037
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3037
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
 Environment: Ubuntu 10.04 LTS
 Hadoop from Cloudera 0.20.203
 Latest java
Reporter: Anton Vedeshin
Priority: Blocker

 After upgrade of cassandra from 0.8.2 to 0.8.4, got this error, before 
 upgrade everything was working fine
 I have restarted cassandra, removed data, etc. nothing helps 
 I have 6 identical machines in the cloud, before it was working fine. If I 
 make netstat then it shows port 9160 listening nodetool ... ring - responces 
 with 6 machines UP.
 Finally I have truncated all 6 servers and reinstalled hadoop + cassandra 
 0.8.4 from scratch.
 compiled and tried to execute word_cound
 receive the same error as after upgrade
 Error:
 11/08/15 15:23:54 INFO WordCount: output reducer type: cassandra
 11/08/15 15:23:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 Exception in thread main java.io.IOException: Could not get input splits
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
 at WordCount.run(Unknown Source)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at WordCount.main(Unknown Source)
 Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
 failed connecting to all endpoints slave1/154.198.69.242
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
 ... 7 more
 Caused by: java.io.IOException: failed connecting to all endpoints 
 slave1/154.198.69.242
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:234)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085835#comment-13085835
 ] 

Jonathan Ellis commented on CASSANDRA-3040:
---

Happy to see what you come up with, but I warn that historically this has been 
a fragile area.  Let's keep to 1.0-only.

 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.5


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3040:
---

Fix Version/s: (was: 0.8.5)
   1.0

 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085838#comment-13085838
 ] 

Pavel Yaskevich commented on CASSANDRA-3040:


Sure

 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2868:
--

Attachment: 2868-v3.txt

bq. I've never actually been able to get  1 to happen, but we can add it to 
the logging

I'm sure it's possible w/ a small enough heap, especially since GCInspector is 
paused along w/ everything else for STW collections (including new gen).

v3 attached to accomodate this and add durationPerCollection.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085850#comment-13085850
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

dirty working directory.  GCI is the only relevant file.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085858#comment-13085858
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

Thanks Jonathan for the snippet of code. I didn't notice it was broken.

I don't see where CallbackInfo.shouldHint is broken. 

{code}   
public boolean shouldHint()
{
if (StorageProxy.shouldHint(target)  isMutation)
{
try
{
1)  ((IWriteResponseHandler) callback).get();
return true;
}
catch (TimeoutException e) 
{
// CL was not achieved. We should not hint.
}
}
return false;
}
{code}

I process the callback after the message expired. If the CL was achieved (and 
the requirement for a hint are gathered) I return true for this target meaning 
that a hint needs to be written. 
On the other hand, if the message expire and the CL was not achieved, then I 
return FALSE (for this target).

Perhaps it needs a special treatment during the shutdown ?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085885#comment-13085885
 ] 

Brandon Williams commented on CASSANDRA-2820:
-

I'm seeing up to 10% improvement with stress on real iron with this patch.

 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Assignee: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3040:
---

Attachment: CASSANDRA-3040.patch

 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0

 Attachments: CASSANDRA-3040.patch


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patricio Echague updated CASSANDRA-2034:


Attachment: CASSANDRA-2034-trunk-v15.patch

v15 replaces v14.

- Wait for hints are handled in the IWriteResponseHandler
- Fix broken SP.shoudHint

Note: I think the Callback.shoudHint needs an enhancement for when we are 
during shut down in MessagingService.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v15.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3025) PHP/PDO driver for Cassandra CQL

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3025:
--

Reviewer: thepaul

 PHP/PDO driver for Cassandra CQL
 

 Key: CASSANDRA-3025
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3025
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Mikko Koppanen
  Labels: php
 Attachments: pdo_cassandra-0.1.0.tgz


 Hello,
 attached is the initial version of the PDO driver for Cassandra CQL language. 
 This is a native PHP extension written in what I would call a combination of 
 C and C++, due to PHP being C. The thrift API used is the C++.
 The API looks roughly following:
 {code}
 ?php
 $db = new PDO('cassandra:host=127.0.0.1;port=9160');
 $db-exec (CREATE KEYSPACE mytest with strategy_class = 'SimpleStrategy' and 
 strategy_options:replication_factor=1;);
 $db-exec (USE mytest);
 $db-exec (CREATE COLUMNFAMILY users (
   my_key varchar PRIMARY KEY,
   full_name varchar ););
   
 $stmt = $db-prepare (INSERT INTO users (my_key, full_name) VALUES (:key, 
 :full_name););
 $stmt-execute (array (':key' = 'mikko', ':full_name' = 'Mikko K' ));
 {code}
 Currently prepared statements are emulated on the client side but I 
 understand that there is a plan to add prepared statements to Cassandra CQL 
 API as well. I will add this feature in to the extension as soon as they are 
 implemented.
 Additional documentation can be found in github 
 https://github.com/mkoppanen/php-pdo_cassandra, in the form of rendered 
 MarkDown file. Tests are currently not included in the package file and they 
 can be found in the github for now as well.
 I have created documentation in docbook format as well, but have not yet 
 rendered it.
 Comments and feedback are welcome.
 Thanks,
 Mikko

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2820) Re-introduce FastByteArrayInputStream (and Output equivalent)

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085908#comment-13085908
 ] 

Hudson commented on CASSANDRA-2820:
---

Integrated in Cassandra #1026 (See 
[https://builds.apache.org/job/Cassandra/1026/])
Re-introduce FastByteArrayInputStream (and Output equivalent)
Patch by Paul Loy, reviewed by brandonwilliams for CASSANDRA-2820

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158410
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/db/ReadVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Truncation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/TruncateResponse.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/CounterMutationVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/gms/Gossiper.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/util/FastByteArrayOutputStream.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLog.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/util/FastByteArrayInputStream.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IndexScanCommand.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/gms/GossipDigestAck2VerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/TruncateVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/WriteResponse.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamReplyVerbHandler.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/service/AbstractRowResolver.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamRequestMessage.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/util/OutputBuffer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RangeSliceCommand.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ReadCommand.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RangeSliceReply.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutationVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/MigrationManager.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/gms/GossipDigestSynVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ReadRepairVerbHandler.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/thrift/CassandraServer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CounterMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamRequestVerbHandler.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamReply.java


 Re-introduce FastByteArrayInputStream (and Output equivalent)
 -

 Key: CASSANDRA-2820
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2820
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
 Environment: n/a
Reporter: Paul Loy
Assignee: Paul Loy
Priority: Minor
  Labels: bytearrayinputstream, bytearrayoutputstream, license, 
 synchronized
 Fix For: 1.0

 Attachments: fast_bytearray_iostreams_harmony-patch-2.txt, 
 fast_bytearray_iostreams_harmony-patch-3.txt, 
 fast_bytearray_iostreams_harmony-patch-4.txt, 
 fast_bytearray_iostreams_harmony-patch-5.txt, 
 fast_bytearray_iostreams_harmony-patch-6.txt


 In https://issues.apache.org/jira/browse/CASSANDRA-37 
 FastByteArrayInputStream and FastByteArrayOutputStream were removed due to 
 being code copied from the JDK and then subsequently modified. The JDK 
 license is incompatible with Apache 2 license so the code had to go.
 I have since had a look at the performance of the JDK ByteArrayInputStream 
 and a FastByteArrayInputStream (i.e. one with synchronized methods made 
 un-synchronized) and seen the difference is significant.
 After a warmup-period of 1 loops I get the following for 1 loops 
 through a 128000 byte array:
 bais : 3513ms
 fbais: 72ms
 This varies depending on the OS, machine and Java version, but it's always in 
 favour of the FastByteArrayInputStream as you might expect.
 Then, at Jonathan Ellis' suggestion, I tried this using a modified Apache 
 Harmony ByteArrayInputStream - i.e. one whose license is compatible - and the 
 results were the same. A significant boost.
 I will attach a patch with changes for the 0.8.0 tag.

--
This message is automatically generated by 

[jira] [Created] (CASSANDRA-3041) Move streams data to too many nodes.

2011-08-16 Thread Nick Bailey (JIRA)
Move streams data to too many nodes.


 Key: CASSANDRA-3041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3041
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4, 0.7.8
Reporter: Nick Bailey
 Fix For: 1.0


When you decommission a node, it only streams data to the node that is just now 
gaining responsibility for the node's primary range.

When you move a node it streams data to every node that is responsible for the 
node's primary range. This is way more than it needs to, and could be bad in 
multi-dc setups. We should absolutely use this bug as a chance/reason to better 
unify that code, since move should be doing the same thing decom does.

This might be worth backporting to 0.8 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2806) Expose gossip/FD info to JMX

2011-08-16 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085918#comment-13085918
 ] 

Brandon Williams commented on CASSANDRA-2806:
-

I don't see a way to force eviction with this patch.

 Expose gossip/FD info to JMX
 

 Key: CASSANDRA-2806
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2806
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Patricio Echague
Priority: Minor
 Fix For: 0.8.5

 Attachments: CASSANDRA-2806-0.8-CHANGES.patch, 
 CASSANDRA-2806-0.8-v1.patch, CASSANDRA-2806-0.8-v2.patch, screenshot-1.jpg




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3042) Implement authentication in Pig loadFunc

2011-08-16 Thread Nate McCall (JIRA)
Implement authentication in Pig loadFunc


 Key: CASSANDRA-3042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3042
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Nate McCall
Priority: Minor


Using already existing options for authentication in ConfigHelper, and adding a 
call to client#login just before client#set_keyspace to in 
CassandraStorage#initSchema

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2806) Expose gossip/FD info to JMX

2011-08-16 Thread Patricio Echague (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085925#comment-13085925
 ] 

Patricio Echague commented on CASSANDRA-2806:
-

Doesn't evictEndpoint(String address) work as expected?

I used the same logic that doStatusCheck() uses to evict an endpoint.

If that is not right, please advice.

 Expose gossip/FD info to JMX
 

 Key: CASSANDRA-2806
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2806
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Patricio Echague
Priority: Minor
 Fix For: 0.8.5

 Attachments: CASSANDRA-2806-0.8-CHANGES.patch, 
 CASSANDRA-2806-0.8-v1.patch, CASSANDRA-2806-0.8-v2.patch, screenshot-1.jpg




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2806) Expose gossip/FD info to JMX

2011-08-16 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085947#comment-13085947
 ] 

Brandon Williams commented on CASSANDRA-2806:
-

We need the custom quarantine time, otherwise it's going to be difficult to 
make it useful (invoke it on all machines in under the standard quarantine time 
is doable, but not easy)

 Expose gossip/FD info to JMX
 

 Key: CASSANDRA-2806
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2806
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Patricio Echague
Priority: Minor
 Fix For: 0.8.5

 Attachments: CASSANDRA-2806-0.8-CHANGES.patch, 
 CASSANDRA-2806-0.8-v1.patch, CASSANDRA-2806-0.8-v2.patch, screenshot-1.jpg




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1158008 - in /cassandra/branches/cassandra-0.8: ./ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/service/ src/java/org/apache/cassandra/tools/

2011-08-16 Thread xedin
Author: xedin
Date: Mon Aug 15 21:01:39 2011
New Revision: 1158008

URL: http://svn.apache.org/viewvc?rev=1158008view=rev
Log:
Add 'load new SSTables' functionality to JMX and corresponding refresh 
command to the nodetool
patch by Pavel Yaskevich; reviewed by Brandon Williams for CASSANDRA-2991

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageServiceMBean.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeProbe.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1158008r1=1158007r2=1158008view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Aug 15 21:01:39 2011
@@ -7,7 +7,8 @@
in a commitlog segment (CASSANDRA-3021)
  * fix cassandra.bat when CASSANDRA_HOME contains spaces (CASSANDRA-2952)
  * fix to SSTableSimpleUnsortedWriter bufferSize calculation (CASSANDRA-3027)
-
+ * add a 'load new SSTables' functionality to JMX and corresponding refresh
+   command to the nodetool (CASSANDRA-2991)
 
 0.8.4
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java?rev=1158008r1=1158007r2=1158008view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
 Mon Aug 15 21:01:39 2011
@@ -274,22 +274,10 @@ public class ColumnFamilyStore implement
 ListSSTableReader sstables = new ArrayListSSTableReader();
 for (Map.EntryDescriptor,SetComponent sstableFiles : 
files(table.name, columnFamilyName, false, false).entrySet())
 {
-SSTableReader sstable;
-try
-{
-sstable = SSTableReader.open(sstableFiles.getKey(), 
sstableFiles.getValue(), savedKeys, data, metadata, this.partitioner);
-}
-catch (FileNotFoundException ex)
-{
-logger.error(Missing sstable component in  + sstableFiles + 
; skipped because of  + ex.getMessage());
-continue;
-}
-catch (IOException ex)
-{
-logger.error(Corrupt sstable  + sstableFiles + ; skipped, 
ex);
-continue;
-}
-sstables.add(sstable);
+   SSTableReader reader = openSSTableReader(sstableFiles, savedKeys, 
data, metadata, partitioner);
+
+if (reader != null) // if == null, logger errors where already 
fired
+sstables.add(reader);
 }
 data.addSSTables(sstables);
 
@@ -465,7 +453,99 @@ public class ColumnFamilyStore implement
 
 return new ColumnFamilyStore(table, columnFamily, partitioner, value, 
metadata);
 }
-
+
+/**
+ * See #{@code StorageService.loadNewSSTables(String, String)} for more 
info
+ *
+ * @param ksName The keyspace name
+ * @param cfName The columnFamily name
+ */
+public static synchronized void loadNewSSTables(String ksName, String 
cfName)
+{
+/** ks/cf existence checks will be done by open and getCFS methods for 
us */
+Table table = Table.open(ksName);
+table.getColumnFamilyStore(cfName).loadNewSSTables();
+}
+
+/**
+ * #{@inheritDoc}
+ */
+public synchronized void loadNewSSTables()
+{
+logger.info(Loading new SSTables for  + table.name + / + 
columnFamily + ...);
+
+// current view over ColumnFamilyStore
+DataTracker.View view = data.getView();
+// descriptors of currently registered SSTables
+SetDescriptor currentDescriptors = new HashSetDescriptor();
+// going to hold new SSTable view of the CFS containing old and new 
SSTables
+SetSSTableReader sstables = new HashSetSSTableReader();
+SetDecoratedKey savedKeys = keyCache.readSaved();
+// get the max generation number, to prevent generation conflicts
+int generation = 0;
+
+for (SSTableReader reader : view.sstables)
+{
+sstables.add(reader); 

svn commit: r1158425 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/db/compaction/CompactionManager.java

2011-08-16 Thread xedin
Author: xedin
Date: Tue Aug 16 19:22:27 2011
New Revision: 1158425

URL: http://svn.apache.org/viewvc?rev=1158425view=rev
Log:
Make cleanup and normal compaction able to skip empty rows (rows containing 
nothing but expired tombstones).
patch by Jonathan Ellis; reviewed by Pavel Yaskevich for CASSANDRA-3039

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1158425r1=1158424r2=1158425view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue Aug 16 19:22:27 2011
@@ -9,6 +9,8 @@
  * fix to SSTableSimpleUnsortedWriter bufferSize calculation (CASSANDRA-3027)
  * add a 'load new SSTables' functionality to JMX and corresponding refresh
command to the nodetool (CASSANDRA-2991)
+ * make cleanup and normal compaction able to skip empty rows
+   (rows containing nothing but expired tombstones) (CASSANDRA-3039)
 
 0.8.4
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java?rev=1158425r1=1158424r2=1158425view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
 Tue Aug 16 19:22:27 2011
@@ -569,6 +569,9 @@ public class CompactionManager implement
 while (nni.hasNext())
 {
 AbstractCompactedRow row = nni.next();
+if (row.isEmpty())
+continue;
+
 long position = writer.append(row);
 totalkeysWritten++;
 
@@ -862,8 +865,11 @@ public class CompactionManager implement
 SSTableIdentityIterator row = (SSTableIdentityIterator) 
scanner.next();
 if (Range.isTokenInRanges(row.getKey().token, ranges))
 {
+AbstractCompactedRow compactedRow = 
controller.getCompactedRow(row);
+if (compactedRow.isEmpty())
+continue;
 writer = maybeCreateWriter(cfs, 
compactionFileLocation, expectedBloomFilterSize, writer, 
Collections.singletonList(sstable));
-writer.append(controller.getCompactedRow(row));
+writer.append(compactedRow);
 totalkeysWritten++;
 }
 else




[jira] [Updated] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2034:
--

Attachment: 2034-v16.txt

bq. I process the callback after the message expired

That makes sense.

bq. I think the Callback.shoudHint needs an enhancement for when we are during 
shut down in MessagingService

Yes.  We should probably either wait for the messages to time out (which is 
mildly annoying to the user) or just write hints for everything (which may be 
confusing: why are there hints being sent after I restart, when no node was 
ever down?)  I don't see a perfect solution.

Also, still need to address this:

bq. currentHintsQueueSize [now totalHints] increment needs to be done OUTSIDE 
the runnable or it will never get above the number of task executors

v16 attached: rebased to current head, fixed import ordering, and added some 
comments.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085966#comment-13085966
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote}
currentHintsQueueSize [now totalHints] increment needs to be done OUTSIDE the 
runnable or it will never get above the number of task executors
{quote}

Interesting. I must have forgotten it after one of the patches. I remember 
fixing it before.

{quote}
Yes. We should probably either wait for the messages to time out (which is 
mildly annoying to the user) or just write hints for everything (which may be 
confusing: why are there hints being sent after I restart, when no node was 
ever down?) I don't see a perfect solution.
{quote}

I think I prefer make the user wait for RPCTimeout since it is not that much 
and perhaps puts a bit more clarity than just saving the hints just in case.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085975#comment-13085975
 ] 

Jonathan Ellis commented on CASSANDRA-3040:
---

+1

 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0

 Attachments: CASSANDRA-3040.patch


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085978#comment-13085978
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote}
Also, still need to address this:

currentHintsQueueSize [now totalHints] increment needs to be done OUTSIDE the 
runnable or it will never get above the number of task executors
{quote}

hintsInProgress.incrementAndGet(); happens outside of the executor and actually 
before scheduling it.
totalHints.incrementAndGet(); on the other hand, the totalHint is incremented 
right after the hint was written and within the task.

Is that not right ?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3039) AssertionError on nodetool cleanup

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085984#comment-13085984
 ] 

Hudson commented on CASSANDRA-3039:
---

Integrated in Cassandra-0.8 #281 (See 
[https://builds.apache.org/job/Cassandra-0.8/281/])
Make cleanup and normal compaction able to skip empty rows (rows containing 
nothing but expired tombstones).
patch by Jonathan Ellis; reviewed by Pavel Yaskevich for CASSANDRA-3039

xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158425
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java


 AssertionError on nodetool cleanup
 --

 Key: CASSANDRA-3039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3039
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.4
 Environment: Distributor ID:  Ubuntu
 Description:  Ubuntu 10.10
 Release:  10.10
 Codename: maverick
 AWS: m2.xlarge instance
 6 Node Cluster
Reporter: Ray Slakinski
Assignee: Jonathan Ellis
  Labels: exception, nodetool
 Fix For: 0.8.5

 Attachments: 3039.txt


 While doing a cleanup I got the following AssertionError, I have tried a 
 scrub and a major compaction before the cleanup which has not helped.
 ST:
  INFO 18:49:58,540 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/system/LocationInfo-g-93-Data.db')
  INFO 18:49:58,834 Scrub of 
 SSTableReader(path='/vol/cassandra/data/system/LocationInfo-g-93-Data.db') 
 complete: 4 rows in new sstable and 0 empty (tombstoned) rows dropped
  INFO 18:49:58,913 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/system/Migrations-g-56-Data.db')
  INFO 18:49:59,218 Scrub of 
 SSTableReader(path='/vol/cassandra/data/system/Migrations-g-56-Data.db') 
 complete: 1 rows in new sstable and 0 empty (tombstoned) rows dropped
  INFO 18:49:59,256 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/system/Schema-g-58-Data.db')
  INFO 18:49:59,323 Scrub of 
 SSTableReader(path='/vol/cassandra/data/system/Schema-g-58-Data.db') 
 complete: 34 rows in new sstable and 0 empty (tombstoned) rows dropped
  INFO 18:49:59,416 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5074-Data.db')
  INFO 18:50:50,137 Scrub of 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5074-Data.db')
  complete: 91735 rows in new sstable and 32 empty (tombstoned) rows dropped
  INFO 18:50:50,137 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5075-Data.db')
  INFO 18:50:53,075 Scrub of 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5075-Data.db')
  complete: 27940 rows in new sstable and 0 empty (tombstoned) rows dropped
  INFO 18:50:53,089 Scrubbing 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content-g-238-Data.db')
  INFO 18:51:10,302 Scrub of 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content-g-238-Data.db')
  complete: 70815 rows in new sstable and 0 empty (tombstoned) rows dropped
  INFO 18:53:05,420 Cleaning up 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5078-Data.db')
  INFO 18:53:13,266 Cleaned up to 
 /vol/cassandra/data/SpiderServices/Content2-tmp-g-5079-Data.db.  198,705,176 
 to 198,705,176 (~100% of original) bytes for 27,940 keys.  Time: 7,846ms.
  INFO 18:53:13,267 Cleaning up 
 SSTableReader(path='/vol/cassandra/data/SpiderServices/Content2-g-5077-Data.db')
 ERROR 18:53:33,913 Fatal exception in thread 
 Thread[CompactionExecutor:21,1,RMI Runtime]
 java.lang.AssertionError
   at 
 org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:107)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132)
   at 
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:866)
   at 
 org.apache.cassandra.db.compaction.CompactionManager.access$500(CompactionManager.java:65)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:204)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2061) Missing logging for some exceptions

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085985#comment-13085985
 ] 

Hudson commented on CASSANDRA-2061:
---

Integrated in Cassandra #1027 (See 
[https://builds.apache.org/job/Cassandra/1027/])
Fix missing logging for some exceptions
patch by Jonathan Ellis; reviewed by Pavel Yaskevich for CASSANDRA-2061

xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158439
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/concurrent/RetryingScheduledThreadPoolExecutor.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/concurrent/DebuggableScheduledThreadPoolExecutor.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/gms/Gossiper.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/locator/DynamicEndpointSnitchTest.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java


 Missing logging for some exceptions
 ---

 Key: CASSANDRA-2061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2061
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Stu Hood
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0

 Attachments: 2061-0.7.txt, 2061-v3.txt, 2061.txt

   Original Estimate: 8h
  Remaining Estimate: 8h

 {quote}Since you are using ScheduledThreadPoolExecutor.schedule(), the 
 exception was swallowed by the FutureTask.
 You will have to perform a get() method on the ScheduledFuture, and you will 
 get ExecutionException if there was any exception occured in run().{quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patricio Echague updated CASSANDRA-2034:


Attachment: 2034-v17.txt

Fix defective v1 patch.

CallbackInfo and CreatiomTimeAware were missing. 

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 2034-v17.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3041) Move streams data to too many nodes.

2011-08-16 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-3041:
---

Fix Version/s: (was: 1.0)
   0.8.5

Marking this for the next 0.8.x version since the data duplication caused by 
this bug is data that does actually belong on the nodes, so it can't be removed 
by cleanup. It either has to be unduplicated with a major compaction or rely on 
regular compaction.

 Move streams data to too many nodes.
 

 Key: CASSANDRA-3041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3041
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.8, 0.8.4
Reporter: Nick Bailey
 Fix For: 0.8.5


 When you decommission a node, it only streams data to the node that is just 
 now gaining responsibility for the node's primary range.
 When you move a node it streams data to every node that is responsible for 
 the node's primary range. This is way more than it needs to, and could be bad 
 in multi-dc setups. We should absolutely use this bug as a chance/reason to 
 better unify that code, since move should be doing the same thing decom does.
 This might be worth backporting to 0.8 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3041) Move streams data to too many nodes.

2011-08-16 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-3041:
---


Marking this for the next 0.8.x version since the data duplication caused by 
this bug is data that does actually belong on the nodes, so it can't be removed 
by cleanup. It either has to be unduplicated with a major compaction or rely on 
regular compaction.

 Move streams data to too many nodes.
 

 Key: CASSANDRA-3041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3041
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.8, 0.8.4
Reporter: Nick Bailey
 Fix For: 0.8.5


 When you decommission a node, it only streams data to the node that is just 
 now gaining responsibility for the node's primary range.
 When you move a node it streams data to every node that is responsible for 
 the node's primary range. This is way more than it needs to, and could be bad 
 in multi-dc setups. We should absolutely use this bug as a chance/reason to 
 better unify that code, since move should be doing the same thing decom does.
 This might be worth backporting to 0.8 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3041) Move streams data to too many nodes.

2011-08-16 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-3041:
---

Affects Version/s: (was: 0.7.8)

 Move streams data to too many nodes.
 

 Key: CASSANDRA-3041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3041
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Nick Bailey
 Fix For: 0.8.5


 When you decommission a node, it only streams data to the node that is just 
 now gaining responsibility for the node's primary range.
 When you move a node it streams data to every node that is responsible for 
 the node's primary range. This is way more than it needs to, and could be bad 
 in multi-dc setups. We should absolutely use this bug as a chance/reason to 
 better unify that code, since move should be doing the same thing decom does.
 This might be worth backporting to 0.8 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3041) Move streams data to too many nodes.

2011-08-16 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-3041:
---

Comment: was deleted

(was: Marking this for the next 0.8.x version since the data duplication caused 
by this bug is data that does actually belong on the nodes, so it can't be 
removed by cleanup. It either has to be unduplicated with a major compaction or 
rely on regular compaction.)

 Move streams data to too many nodes.
 

 Key: CASSANDRA-3041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3041
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Nick Bailey
 Fix For: 0.8.5


 When you decommission a node, it only streams data to the node that is just 
 now gaining responsibility for the node's primary range.
 When you move a node it streams data to every node that is responsible for 
 the node's primary range. This is way more than it needs to, and could be bad 
 in multi-dc setups. We should absolutely use this bug as a chance/reason to 
 better unify that code, since move should be doing the same thing decom does.
 This might be worth backporting to 0.8 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2498) Improve read performance in update-intensive workload

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2498:
--

Reviewer: jbellis
Assignee: Daniel Doubleday  (was: Sylvain Lebresne)

 Improve read performance in update-intensive workload
 -

 Key: CASSANDRA-2498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2498
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Daniel Doubleday
Priority: Minor
  Labels: ponies
 Fix For: 1.0

 Attachments: 2498-v2.txt, 2498-v3.txt, 
 supersede-name-filter-collations.patch


 Read performance in an update-heavy environment relies heavily on compaction 
 to maintain good throughput. (This is not the case for workloads where rows 
 are only inserted once, because the bloom filter keeps us from having to 
 check sstables unnecessarily.)
 Very early versions of Cassandra attempted to mitigate this by checking 
 sstables in descending generation order (mostly equivalent to descending 
 mtime): once all the requested columns were found, it would not check any 
 older sstables.
 This was incorrect, because data timestamp will not correspond to sstable 
 timestamp, both because compaction has the side effect of refreshing data 
 to a newer sstable, and because hintead handoff may send us data older than 
 what we already have.
 Instead, we could create a per-sstable piece of metadata containing the most 
 recent (client-specified) timestamp for any column in the sstable.  We could 
 then sort sstables by this timestamp instead, and perform a similar 
 optimization (if the remaining sstable client-timestamps are older than the 
 oldest column found in the desired result set so far, we don't need to look 
 further). Since under almost every workload, client timestamps of data in a 
 given sstable will tend to be similar, we expect this to cut the number of 
 sstables down proportionally to how frequently each column in the row is 
 updated. (If each column is updated with each write, we only have to check a 
 single sstable.)
 This may also be useful information when deciding which SSTables to compact.
 (Note that this optimization is only appropriate for named-column queries, 
 not slice queries, since we don't know what non-overlapping columns may exist 
 in older sstables.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3040) Refactor and optimize ColumnFamilyStore.files(...) and Descriptor.fromFilename and few other places responsible for work with SSTable files.

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086021#comment-13086021
 ] 

Hudson commented on CASSANDRA-3040:
---

Integrated in Cassandra #1028 (See 
[https://builds.apache.org/job/Cassandra/1028/])
Refactor and optimize ColumnFamilyStore.files(...) and 
Descriptor.fromFilename(String) and few other places responsible for work with 
SSTable files
patch by Pavel Yaskevich; reviewed by Jonathan Ellis for CASSANDRA-3040

xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158452
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTable.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/Descriptor.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java


 Refactor and optimize ColumnFamilyStore.files(...) and 
 Descriptor.fromFilename and few other places responsible for work with 
 SSTable files.
 

 Key: CASSANDRA-3040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0

 Attachments: CASSANDRA-3040.patch


 ColumnFamilyStore.files(...) methods are not optimal in a sense of work they 
 are doing - scanning whole table directory for files and directories, but 
 it's mission is to locate CF specific files only.
 Descriptor.fromFilename could be refactored to use getParentFile and getName 
 methods instead of manual parsing of the path. Small refactorings in this 
 sense are planed for Component and SSTable classes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3043) SSTableImportTest fails on Windows because of because of malformed file path

2011-08-16 Thread Vladimir Loncar (JIRA)
SSTableImportTest fails on Windows because of because of malformed file path


 Key: CASSANDRA-3043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3043
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Vladimir Loncar
Assignee: Vladimir Loncar
Priority: Trivial


SSTableImportTest uses URL.getPath() to create path to JSON files. This fails 
on Windows in many cases (for example if there are spaces in path which get 
encoded as %20 which Windows doesn't like). Trick is to create URI from URL 
which satisfies all platforms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3043) SSTableImportTest fails on Windows because of because of malformed file path

2011-08-16 Thread Vladimir Loncar (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Loncar updated CASSANDRA-3043:
---

Attachment: windows-sstable-import-fix.patch

 SSTableImportTest fails on Windows because of because of malformed file path
 

 Key: CASSANDRA-3043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3043
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Vladimir Loncar
Assignee: Vladimir Loncar
Priority: Trivial
  Labels: windows
 Attachments: windows-sstable-import-fix.patch


 SSTableImportTest uses URL.getPath() to create path to JSON files. This fails 
 on Windows in many cases (for example if there are spaces in path which get 
 encoded as %20 which Windows doesn't like). Trick is to create URI from URL 
 which satisfies all platforms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3043) SSTableImportTest fails on Windows because malformed file path

2011-08-16 Thread Vladimir Loncar (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Loncar updated CASSANDRA-3043:
---

Summary: SSTableImportTest fails on Windows because malformed file path  
(was: SSTableImportTest fails on Windows because of because of malformed file 
path)

 SSTableImportTest fails on Windows because malformed file path
 --

 Key: CASSANDRA-3043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3043
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Vladimir Loncar
Assignee: Vladimir Loncar
Priority: Trivial
  Labels: windows
 Attachments: windows-sstable-import-fix.patch


 SSTableImportTest uses URL.getPath() to create path to JSON files. This fails 
 on Windows in many cases (for example if there are spaces in path which get 
 encoded as %20 which Windows doesn't like). Trick is to create URI from URL 
 which satisfies all platforms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3043) SSTableImportTest fails on Windows because of malformed file path

2011-08-16 Thread Vladimir Loncar (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Loncar updated CASSANDRA-3043:
---

Summary: SSTableImportTest fails on Windows because of malformed file path  
(was: SSTableImportTest fails on Windows because malformed file path)

 SSTableImportTest fails on Windows because of malformed file path
 -

 Key: CASSANDRA-3043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3043
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Vladimir Loncar
Assignee: Vladimir Loncar
Priority: Trivial
  Labels: windows
 Attachments: windows-sstable-import-fix.patch


 SSTableImportTest uses URL.getPath() to create path to JSON files. This fails 
 on Windows in many cases (for example if there are spaces in path which get 
 encoded as %20 which Windows doesn't like). Trick is to create URI from URL 
 which satisfies all platforms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3044) Hector NodeAutoDiscoverService fails to resolve hosts due to / being part of the IP address

2011-08-16 Thread Aaron Turner (JIRA)
Hector NodeAutoDiscoverService fails to resolve hosts due to / being part of 
the IP address
---

 Key: CASSANDRA-3044
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3044
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Drivers
Affects Versions: 0.8.4
 Environment: Cassandra 0.8.4, hector 0.8.0
Reporter: Aaron Turner


Didn't get this problem with Cassandra 0.8.2- started happening under 0.8.4.  
Temporary work around was to disable node auto discovery.  Seems to be related 
to:

http://svn.apache.org/viewvc?view=revrevision=1155157
http://issues.apache.org/jira/browse/CASSANDRA-1777

LOG:

240514 [pool-2-thread-1] INFO 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService - using existing 
hosts [10.255.255.176(10.255.255.176):9160, 10.255.255.175(10.255.255.175):9160]
240553 [pool-2-thread-1] ERROR me.prettyprint.cassandra.service.CassandraHost - 
Unable to resolve host /10.255.255.176
240553 [pool-2-thread-1] INFO 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Found a node we 
don't know about /10.255.255.176(/10.255.255.176):9160 for TokenRange 
TokenRange(start_token:33370589793653380361461751202224080323, 
end_token:93518639523624865529944734322199113946, endpoints:[/10.255.255.176])
240553 [pool-2-thread-1] INFO 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Found 1 new 
host(s) in Ring
240553 [pool-2-thread-1] INFO 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Addding found 
host /10.255.255.176(/10.255.255.176):9160 to pool
240554 [pool-2-thread-1] ERROR 
me.prettyprint.cassandra.connection.HConnectionManager - General exception host 
to HConnectionManager: /10.255.255.176(/10.255.255.176):9160
java.lang.IllegalArgumentException: protocol = socket host = null
at 
sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
at 
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at 
me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java:123)
at 
me.prettyprint.cassandra.connection.ConcurrentHClientPool.init(ConcurrentHClientPool.java:43)
at 
me.prettyprint.cassandra.connection.RoundRobinBalancingPolicy.createConnection(RoundRobinBalancingPolicy.java:68)
at 
me.prettyprint.cassandra.connection.HConnectionManager.addCassandraHost(HConnectionManager.java:103)
at 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:68)
at 
me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:53)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3044) Hector NodeAutoDiscoverService fails to resolve hosts due to / being part of the IP address

2011-08-16 Thread Aaron Turner (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Turner resolved CASSANDRA-3044.
-

   Resolution: Fixed
Fix Version/s: 0.8.5

driftx says on IRC that this bug was already fixed so closing it myself.

 Hector NodeAutoDiscoverService fails to resolve hosts due to / being part of 
 the IP address
 ---

 Key: CASSANDRA-3044
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3044
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Drivers
Affects Versions: 0.8.4
 Environment: Cassandra 0.8.4, hector 0.8.0
Reporter: Aaron Turner
 Fix For: 0.8.5


 Didn't get this problem with Cassandra 0.8.2- started happening under 0.8.4.  
 Temporary work around was to disable node auto discovery.  Seems to be 
 related to:
 http://svn.apache.org/viewvc?view=revrevision=1155157
 http://issues.apache.org/jira/browse/CASSANDRA-1777
 LOG:
 240514 [pool-2-thread-1] INFO 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService - using existing 
 hosts [10.255.255.176(10.255.255.176):9160, 
 10.255.255.175(10.255.255.175):9160]
 240553 [pool-2-thread-1] ERROR me.prettyprint.cassandra.service.CassandraHost 
 - Unable to resolve host /10.255.255.176
 240553 [pool-2-thread-1] INFO 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Found a node we 
 don't know about /10.255.255.176(/10.255.255.176):9160 for TokenRange 
 TokenRange(start_token:33370589793653380361461751202224080323, 
 end_token:93518639523624865529944734322199113946, endpoints:[/10.255.255.176])
 240553 [pool-2-thread-1] INFO 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Found 1 new 
 host(s) in Ring
 240553 [pool-2-thread-1] INFO 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService - Addding found 
 host /10.255.255.176(/10.255.255.176):9160 to pool
 240554 [pool-2-thread-1] ERROR 
 me.prettyprint.cassandra.connection.HConnectionManager - General exception 
 host to HConnectionManager: /10.255.255.176(/10.255.255.176):9160
 java.lang.IllegalArgumentException: protocol = socket host = null
 at 
 sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358)
 at java.net.Socket.connect(Socket.java:529)
 at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
 at 
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
 at 
 me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java:123)
 at 
 me.prettyprint.cassandra.connection.ConcurrentHClientPool.init(ConcurrentHClientPool.java:43)
 at 
 me.prettyprint.cassandra.connection.RoundRobinBalancingPolicy.createConnection(RoundRobinBalancingPolicy.java:68)
 at 
 me.prettyprint.cassandra.connection.HConnectionManager.addCassandraHost(HConnectionManager.java:103)
 at 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:68)
 at 
 me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:53)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086041#comment-13086041
 ] 

Brandon Williams commented on CASSANDRA-2868:
-

+1 to GCI changes.  Also, it is indeed possible to get 1 with a tiny heap.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3045) Update ColumnFamilyOutputFormat to use new bulkload API

2011-08-16 Thread Jonathan Ellis (JIRA)
Update ColumnFamilyOutputFormat to use new bulkload API
---

 Key: CASSANDRA-3045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3045
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.0


The bulk loading interface added in CASSANDRA-1278 is a great fit for Hadoop 
jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1158490 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/service/GCInspector.java

2011-08-16 Thread jbellis
Author: jbellis
Date: Wed Aug 17 02:20:48 2011
New Revision: 1158490

URL: http://svn.apache.org/viewvc?rev=1158490view=rev
Log:
work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1158490r1=1158489r2=1158490view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Wed Aug 17 02:20:48 2011
@@ -11,6 +11,9 @@
command to the nodetool (CASSANDRA-2991)
  * make cleanup and normal compaction able to skip empty rows
(rows containing nothing but expired tombstones) (CASSANDRA-3039)
+ * work around native memory leak in com.sun.management.GarbageCollectorMXBean
+   (CASSANDRA-2868)
+
 
 0.8.4
  * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972)

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java?rev=1158490r1=1158489r2=1158490view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java
 Wed Aug 17 02:20:48 2011
@@ -20,11 +20,13 @@ package org.apache.cassandra.service;
  * 
  */
 
+import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
+import java.lang.management.MemoryMXBean;
 import java.lang.management.MemoryUsage;
-import java.lang.reflect.InvocationTargetException;
-import java.lang.reflect.Method;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
 import java.util.concurrent.TimeUnit;
 import javax.management.MBeanServer;
 import javax.management.ObjectName;
@@ -45,32 +47,22 @@ public class GCInspector
 public static final GCInspector instance = new GCInspector();
 
 private HashMapString, Long gctimes = new HashMapString, Long();
+private HashMapString, Long gccounts = new HashMapString, Long();
+
+ListGarbageCollectorMXBean beans = new 
ArrayListGarbageCollectorMXBean();
+MemoryMXBean membean = ManagementFactory.getMemoryMXBean();
 
-ListObject beans = new ArrayListObject(); // these are instances of 
com.sun.management.GarbageCollectorMXBean
 private volatile boolean cacheSizesReduced;
 
 public GCInspector()
 {
-// we only want this class to do its thing on sun jdks, or when the 
sun classes are present.
-Class gcBeanClass = null;
-try
-{
-gcBeanClass = 
Class.forName(com.sun.management.GarbageCollectorMXBean);
-Class.forName(com.sun.management.GcInfo);
-}
-catch (ClassNotFoundException ex)
-{
-// this happens when using a non-sun jdk.
-logger.warn(Cannot load sun GC monitoring classes. GCInspector is 
disabled.);
-}
-
 MBeanServer server = ManagementFactory.getPlatformMBeanServer();
 try
 {
 ObjectName gcName = new 
ObjectName(ManagementFactory.GARBAGE_COLLECTOR_MXBEAN_DOMAIN_TYPE + ,*);
 for (ObjectName name : server.queryNames(gcName, null))
 {
-Object gc = ManagementFactory.newPlatformMXBeanProxy(server, 
name.getCanonicalName(), gcBeanClass);
+GarbageCollectorMXBean gc = 
ManagementFactory.newPlatformMXBeanProxy(server, name.getCanonicalName(), 
GarbageCollectorMXBean.class);
 beans.add(gc);
 }
 }
@@ -97,43 +89,42 @@ public class GCInspector
 
 private void logGCResults()
 {
-for (Object gc : beans)
+for (GarbageCollectorMXBean gc : beans)
 {
-SunGcWrapper gcw = new SunGcWrapper(gc);
-if (gcw.isLastGcInfoNull())
+Long previousTotal = gctimes.get(gc.getName());
+Long total = gc.getCollectionTime();
+if (previousTotal == null)
+previousTotal = 0L;
+if (previousTotal.equals(total))
 continue;
-
-Long previous = gctimes.get(gcw.getName());
-if (previous != null  previous.longValue() == 
gcw.getCollectionTime().longValue())
-continue;
-gctimes.put(gcw.getName(), gcw.getCollectionTime());
-
-long previousMemoryUsed = 0;
-long memoryUsed = 0;
-long memoryMax = 0;
-for 

[jira] [Updated] (CASSANDRA-3043) SSTableImportTest fails on Windows because of malformed file path

2011-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3043:
--

Attachment: 3043-v2.txt

v2 moves the logic to a method and adds comments.  Does that look good to you?

 SSTableImportTest fails on Windows because of malformed file path
 -

 Key: CASSANDRA-3043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3043
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: Windows
Reporter: Vladimir Loncar
Assignee: Vladimir Loncar
Priority: Trivial
  Labels: windows
 Attachments: 3043-v2.txt, windows-sstable-import-fix.patch


 SSTableImportTest uses URL.getPath() to create path to JSON files. This fails 
 on Windows in many cases (for example if there are spaces in path which get 
 encoded as %20 which Windows doesn't like). Trick is to create URI from URL 
 which satisfies all platforms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086100#comment-13086100
 ] 

Hudson commented on CASSANDRA-2868:
---

Integrated in Cassandra-0.8 #282 (See 
[https://builds.apache.org/job/Cassandra-0.8/282/])
work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158490
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java


 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086102#comment-13086102
 ] 

Jeremiah Jordan commented on CASSANDRA-2868:


Can we get this in 0.7.X as well?

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3046) [patch] No need to use .equals on enums, just opens up chance of NPE

2011-08-16 Thread Dave Brosius (JIRA)
[patch] No need to use .equals on enums, just opens up chance of NPE


 Key: CASSANDRA-3046
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3046
 Project: Cassandra
  Issue Type: Improvement
Reporter: Dave Brosius
Priority: Trivial


pretty trivial patch that change enum1.equals(enum2) into enum1 == enum2, as 
.equals isn't needed, and just opens up the possibility of NPEs where == 
handles them correctly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3046) [patch] No need to use .equals on enums, just opens up chance of NPE

2011-08-16 Thread Dave Brosius (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Brosius updated CASSANDRA-3046:


Attachment: enum_equality.diff

 [patch] No need to use .equals on enums, just opens up chance of NPE
 

 Key: CASSANDRA-3046
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3046
 Project: Cassandra
  Issue Type: Improvement
Reporter: Dave Brosius
Priority: Trivial
 Attachments: enum_equality.diff


 pretty trivial patch that change enum1.equals(enum2) into enum1 == enum2, as 
 .equals isn't needed, and just opens up the possibility of NPEs where == 
 handles them correctly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1158528 - in /cassandra/trunk: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cql/ src/java/org/apache/cassandra/db/commitlog/ src/java/o

2011-08-16 Thread jbellis
Author: jbellis
Date: Wed Aug 17 05:52:45 2011
New Revision: 1158528

URL: http://svn.apache.org/viewvc?rev=1158528view=rev
Log:
merge from 0.8

Modified:
cassandra/trunk/   (props changed)
cassandra/trunk/CHANGES.txt
cassandra/trunk/contrib/   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/trunk/src/java/org/apache/cassandra/cql/Cql.g
cassandra/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLog.java

cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java

cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java
cassandra/trunk/src/java/org/apache/cassandra/service/GCInspector.java

Propchange: cassandra/trunk/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Wed Aug 17 05:52:45 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
 /cassandra/branches/cassandra-0.7:1026516-1151306
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1157377
+/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1158501
 /cassandra/branches/cassandra-0.8.0:1125021-1130369
 /cassandra/branches/cassandra-0.8.1:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1158528r1=1158527r2=1158528view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Wed Aug 17 05:52:45 2011
@@ -45,6 +45,10 @@
in a commitlog segment (CASSANDRA-3021)
  * fix cassandra.bat when CASSANDRA_HOME contains spaces (CASSANDRA-2952)
  * fix to SSTableSimpleUnsortedWriter bufferSize calculation (CASSANDRA-3027)
+ * make cleanup and normal compaction able to skip empty rows
+   (rows containing nothing but expired tombstones) (CASSANDRA-3039)
+ * work around native memory leak in com.sun.management.GarbageCollectorMXBean
+   (CASSANDRA-2868)
 
 
 0.8.4

Propchange: cassandra/trunk/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Wed Aug 17 05:52:45 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
 /cassandra/branches/cassandra-0.7/contrib:1026516-1151306
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
-/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1157377
+/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1158501
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369
 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018
 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Wed Aug 17 05:52:45 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1151306
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
-/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1157377
+/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1158501
 
/cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369
 
/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1125018
 
/cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689

Propchange: 
cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Wed Aug 17