date:20101210


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970102#action_12970102
 ] 

Peter Schuller commented on CASSANDRA-1470:
---

Just to clarify then; as jbellis surmised my comments where indeed based on the 
fact that writes will be synchronous. In particular, what write caching gives 
you normally is the ability to defer the actual writing such that:

(1) future writes can be colesced with past writes which in the extreme case 
translates seek-bound I/O to huge slabs of sequential I/O
(2) re-written pages aren't re-written on disk
(3) it allows the program to continue (e.g. churning CPU) without interrupting 
to wait for disk I/O
(4) It de-couples the size of individual writes the application happens to make 
from the way it gets written out to disk

Using direct I/O in the general case is difficult because there is a lot of 
logic in the kernel to implement this in a way that works generally. But with 
cassandra, we:

(1) are not concerned with re-writing pages
(2) are not concerned with mixing seek-bound and streaming I/O
(3) are specifically after writing large amounts of data and we can select when 
to flush in-memory buffers

So the problem becomes easier. But still, each direct write will essentially 
behave like a write() followed by an fsync(), with the performance implications 
that has (though not necessarily exactly; e.g. an asynchronous write() followed 
by fsync() might sit in an i/o queue waiting if the fsync() doesn't highten the 
priority of the previous write etc - depending on exact kernel behavior and 
whatnot).

As far as I know, given large chunks being written we really should be able to 
achieve similar throughputs as the background writing done by the kernel. With 
one major caveat: If the writing is single-threaded, the lack of an 
asynchronous syscall API means that the thread will not be able to keep busy 
with CPU bound activity while waiting for the actual write. So while the 
writing when it does happen really should have the potential to be efficient, 
if one does want to simultaneously be CPU bound in e.g. compaction, the writing 
would have to happen from a background thread.

However, note that the CPU waiting is not necessarily as bad is it sounds. If 
your compaction is heavily CPU bound the effect will be small in relative terms 
because very little time is spent doing the I/O anyway. If the compaction is 
heavily disk bound, you don't really care anyway since any additional time 
spent spinning CPU is just going to *lessen* negative impacts of compaction 
because it decreases the effect on live traffic.

The most significant effect should be seen when compaction is reasonably 
balanced between CPU and disk, and in the extreme case one should potentially 
see up to a halving of compaction speed in a situation without live traffic 
further delaying I/O.

I hope I'm being clear :) (And definitely do correct me if I'm overlooking 
something.) I feel a bit bad commenting all the time without actually putting 
up any code...


 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction

2010-12-10 Thread Oleg Anastasyev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970103#action_12970103
 ] 

Oleg Anastasyev commented on CASSANDRA-1470:


Guys, may be I misunderstand something, but are we trying to make compaction 
writes faster ? This is background data maintenance process, and I think it is 
not very much important how fast is it. I think most important about it is how 
much it makes normal (read) requests serving slower, keeping valuable 
resources, especially memory, busy. 
Did we measured the impact of compaction on read requests  of  direct IO writes 
compaction vs old normal writes compaction ?



 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1470) use direct io for compaction

2010-12-10 Thread Oleg Anastasyev (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Anastasyev updated CASSANDRA-1470:
---

Comment: was deleted

(was: Guys, may be I misunderstand something, but are we trying to make 
compaction writes faster ? This is background data maintenance process, and I 
think it is not very much important how fast is it. I think most important 
about it is how much it makes normal (read) requests serving slower, keeping 
valuable resources, especially memory, busy. 
Did we measured the impact of compaction on read requests  of  direct IO writes 
compaction vs old normal writes compaction ?

)

 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (CASSANDRA-1840) nodetool move caused the moved node to drop itself from 'nodetool ring' output; others think it's 'joining'

nodetool move caused the moved node to drop itself from 'nodetool ring' output; 
others think it's 'joining'
---

 Key: CASSANDRA-1840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1840
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller


I have a test cluster with three nodes on a very recent 0.7 (last few days 
branch). It has very little data in it (so maybe timing can be an issue given 
how fast operations complete). It was otherwise healthy; nodetool ring was 
consistent on all nodes and I had just run some compactions and 'repair' 
commands on all nodes repeatedly.

I had a single client doing some reads/writes of single columns; nothing 
extreme (low load).

When I did a 'nodetool move' the node exited the ring, stopped responding to 
thrift RPC, entered the ring again, and started accepting RPC requests via 
thrift. It reports in the log that it is joined.

However, at this point 'nodetool ring' on the node I moved does *not* show its 
own location in the ring, and other nodes show it as 'joining' (with the new 
token, not the old token). I will include nodetool ring output and log output 
below.

The situation was un-wedged by restarting the node that I had moved. After it 
started and a few seconds passed, nodetool ring looked correct on the node in 
question and other nodes now reported it as 'up' rather than 'joining'.

Moved node said post-move (.61 in the below pastes is the node that I moved):

Address Status State   LoadOwnsToken
   
   
110288156320304836825416347816186393502 
78.31.15.204Up Normal  224.34 KB   61.44%  
44678687293344048155696022135861768368  
193.182.3.229   Up Normal  251.84 KB   38.56%  
110288156320304836825416347816186393502 

And the other two:


Address Status State   LoadOwnsToken
   
   
164957594472845753490452447750528540018 
78.31.15.204Up Normal  224.34 KB   29.31%  
44678687293344048155696022135861768368  
193.182.3.229   Up Normal  251.84 KB   38.56%  
110288156320304836825416347816186393502 
193.182.3.61Up Joining 194.76 KB   32.13%  
164957594472845753490452447750528540018 

Address Status State   LoadOwnsToken
   
   
164957594472845753490452447750528540018 

78.31.15.204Up Normal  
224.34 KB   29.31%  44678687293344048155696022135861768368  
193.182.3.229   Up Normal  251.84 KB   38.56%  
110288156320304836825416347816186393502 
193.182.3.61Up Joining 194.76 KB   32.13%  
164957594472845753490452447750528540018 

I'll try reproducing a few times, and also merge latest 0.7.

Here is some system log output from the node that got moved; it looks good to 
me:

 INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:17,560 
StorageService.java (line 455) Leaving: sleeping 3 ms for pending range 
setup
 INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:47,564 
StorageService.java (line 455) Leaving: streaming data to other nodes
 INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 75) 
Beginning transfer to /78.31.15.204
 INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 98) Flushing 
memtables for KeyspaceSlask...
 INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 639) 
switching in a fresh Memtable for KeyValue at 
CommitLogContext(file='/var/lib/spotify-cassandra/slask/commitlog/CommitLog-1291973418600.log',
 position=2573610)
 INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 943) 
Enqueuing flush of memtable-keyva...@1131602880(370711 bytes, 5533 operations)
 INFO [FlushWriter:1] 2010-12-10 09:38:47,567 Memtable.java (line 155) Writing 
memtable-keyva...@1131602880(370711 bytes, 5533 operations)
 INFO [FlushWriter:1] 2010-12-10 09:38:47,599 Memtable.java (line 162) 
Completed flushing 
/var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db 
(15042 bytes)
 INFO [StreamStage:1] 2010-12-10 09:38:47,601 StreamOut.java (line 171) Stream 
context metadata 
[/var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-67-Data.db/(0,10094)
 progress=0/10094 - 0%, 
/var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db/(0,10094)
 progress=0/10094 - 0%], 2 sstables.
 INFO [StreamStage:1] 2010-12-10 09:38:47,601 StreamOutSession.java (line 175) 
Streaming to /78.31.15.204
 INFO [StreamStage:1] 2010-12-10 09:38:47,601 StreamOut.java (line 75) 
Beginning transfer to

[jira] Commented: (CASSANDRA-1840) nodetool move caused the moved node to drop itself from 'nodetool ring' output; others think it's 'joining'


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970115#action_12970115
 ] 

Peter Schuller commented on CASSANDRA-1840:
---

I could reproduce it consistently. I tried once per node; in each case a 
restart was required. Still same after stopping all nodes and starting them 
from scratch.

I will update to today's 0.7 branch and re-try.

 nodetool move caused the moved node to drop itself from 'nodetool ring' 
 output; others think it's 'joining'
 ---

 Key: CASSANDRA-1840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1840
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller

 I have a test cluster with three nodes on a very recent 0.7 (last few days 
 branch). It has very little data in it (so maybe timing can be an issue given 
 how fast operations complete). It was otherwise healthy; nodetool ring was 
 consistent on all nodes and I had just run some compactions and 'repair' 
 commands on all nodes repeatedly.
 I had a single client doing some reads/writes of single columns; nothing 
 extreme (low load).
 When I did a 'nodetool move' the node exited the ring, stopped responding to 
 thrift RPC, entered the ring again, and started accepting RPC requests via 
 thrift. It reports in the log that it is joined.
 However, at this point 'nodetool ring' on the node I moved does *not* show 
 its own location in the ring, and other nodes show it as 'joining' (with the 
 new token, not the old token). I will include nodetool ring output and log 
 output below.
 The situation was un-wedged by restarting the node that I had moved. After it 
 started and a few seconds passed, nodetool ring looked correct on the node in 
 question and other nodes now reported it as 'up' rather than 'joining'.
 Moved node said post-move (.61 in the below pastes is the node that I moved):
 Address Status State   LoadOwnsToken  
  

 110288156320304836825416347816186393502 
 78.31.15.204Up Normal  224.34 KB   61.44%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 And the other two:
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  224.34 KB   29.31%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  
 224.34 KB   29.31%  44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 I'll try reproducing a few times, and also merge latest 0.7.
 Here is some system log output from the node that got moved; it looks good to 
 me:
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:17,560 
 StorageService.java (line 455) Leaving: sleeping 3 ms for pending range 
 setup
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:47,564 
 StorageService.java (line 455) Leaving: streaming data to other nodes
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 75) 
 Beginning transfer to /78.31.15.204
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 98) 
 Flushing memtables for KeyspaceSlask...
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 639) switching in a fresh Memtable for KeyValue at 
 CommitLogContext(file='/var/lib/spotify-cassandra/slask/commitlog/CommitLog-1291973418600.log',
  position=2573610)
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 943) Enqueuing flush of memtable-keyva...@1131602880(370711 bytes, 5533 
 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,567 Memtable.java (line 155) 
 Writing memtable-keyva...@1131602880(370711 bytes, 5533 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,599 Memtable.java (line 162) 
 Completed flushing 
 /var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db 
 (15042 bytes)
  INFO

[jira] Updated: (CASSANDRA-1840) nodetool move caused the moved node to drop itself from 'nodetool ring' output; others think it's 'joining'


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-1840:
--

Priority: Minor  (was: Major)

 nodetool move caused the moved node to drop itself from 'nodetool ring' 
 output; others think it's 'joining'
 ---

 Key: CASSANDRA-1840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1840
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Priority: Minor

 I have a test cluster with three nodes on a very recent 0.7 (last few days 
 branch). It has very little data in it (so maybe timing can be an issue given 
 how fast operations complete). It was otherwise healthy; nodetool ring was 
 consistent on all nodes and I had just run some compactions and 'repair' 
 commands on all nodes repeatedly.
 I had a single client doing some reads/writes of single columns; nothing 
 extreme (low load).
 When I did a 'nodetool move' the node exited the ring, stopped responding to 
 thrift RPC, entered the ring again, and started accepting RPC requests via 
 thrift. It reports in the log that it is joined.
 However, at this point 'nodetool ring' on the node I moved does *not* show 
 its own location in the ring, and other nodes show it as 'joining' (with the 
 new token, not the old token). I will include nodetool ring output and log 
 output below.
 The situation was un-wedged by restarting the node that I had moved. After it 
 started and a few seconds passed, nodetool ring looked correct on the node in 
 question and other nodes now reported it as 'up' rather than 'joining'.
 Moved node said post-move (.61 in the below pastes is the node that I moved):
 Address Status State   LoadOwnsToken  
  

 110288156320304836825416347816186393502 
 78.31.15.204Up Normal  224.34 KB   61.44%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 And the other two:
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  224.34 KB   29.31%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  
 224.34 KB   29.31%  44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 I'll try reproducing a few times, and also merge latest 0.7.
 Here is some system log output from the node that got moved; it looks good to 
 me:
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:17,560 
 StorageService.java (line 455) Leaving: sleeping 3 ms for pending range 
 setup
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:47,564 
 StorageService.java (line 455) Leaving: streaming data to other nodes
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 75) 
 Beginning transfer to /78.31.15.204
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 98) 
 Flushing memtables for KeyspaceSlask...
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 639) switching in a fresh Memtable for KeyValue at 
 CommitLogContext(file='/var/lib/spotify-cassandra/slask/commitlog/CommitLog-1291973418600.log',
  position=2573610)
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 943) Enqueuing flush of memtable-keyva...@1131602880(370711 bytes, 5533 
 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,567 Memtable.java (line 155) 
 Writing memtable-keyva...@1131602880(370711 bytes, 5533 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,599 Memtable.java (line 162) 
 Completed flushing 
 /var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db 
 (15042 bytes)
  INFO [StreamStage:1] 2010-12-10 09:38:47,601 StreamOut.java (line 171) 
 Stream context metadata 
 [/var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-67-Data.db/(0,10094)
  progress=0/10094 - 0%,

[jira] Commented: (CASSANDRA-1840) nodetool move caused the moved node to drop itself from 'nodetool ring' output; others think it's 'joining'


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970126#action_12970126
 ] 

Peter Schuller commented on CASSANDRA-1840:
---

I am not able to reproduce with latest 0.7, though I'm not sure which change is 
expected to fix this problem so I'm not marking resolved yet until someone else 
does it or weighs in.

 nodetool move caused the moved node to drop itself from 'nodetool ring' 
 output; others think it's 'joining'
 ---

 Key: CASSANDRA-1840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1840
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Priority: Minor

 I have a test cluster with three nodes on a very recent 0.7 (last few days 
 branch). It has very little data in it (so maybe timing can be an issue given 
 how fast operations complete). It was otherwise healthy; nodetool ring was 
 consistent on all nodes and I had just run some compactions and 'repair' 
 commands on all nodes repeatedly.
 I had a single client doing some reads/writes of single columns; nothing 
 extreme (low load).
 When I did a 'nodetool move' the node exited the ring, stopped responding to 
 thrift RPC, entered the ring again, and started accepting RPC requests via 
 thrift. It reports in the log that it is joined.
 However, at this point 'nodetool ring' on the node I moved does *not* show 
 its own location in the ring, and other nodes show it as 'joining' (with the 
 new token, not the old token). I will include nodetool ring output and log 
 output below.
 The situation was un-wedged by restarting the node that I had moved. After it 
 started and a few seconds passed, nodetool ring looked correct on the node in 
 question and other nodes now reported it as 'up' rather than 'joining'.
 Moved node said post-move (.61 in the below pastes is the node that I moved):
 Address Status State   LoadOwnsToken  
  

 110288156320304836825416347816186393502 
 78.31.15.204Up Normal  224.34 KB   61.44%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 And the other two:
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  224.34 KB   29.31%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  
 224.34 KB   29.31%  44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 I'll try reproducing a few times, and also merge latest 0.7.
 Here is some system log output from the node that got moved; it looks good to 
 me:
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:17,560 
 StorageService.java (line 455) Leaving: sleeping 3 ms for pending range 
 setup
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:47,564 
 StorageService.java (line 455) Leaving: streaming data to other nodes
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 75) 
 Beginning transfer to /78.31.15.204
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 98) 
 Flushing memtables for KeyspaceSlask...
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 639) switching in a fresh Memtable for KeyValue at 
 CommitLogContext(file='/var/lib/spotify-cassandra/slask/commitlog/CommitLog-1291973418600.log',
  position=2573610)
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 943) Enqueuing flush of memtable-keyva...@1131602880(370711 bytes, 5533 
 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,567 Memtable.java (line 155) 
 Writing memtable-keyva...@1131602880(370711 bytes, 5533 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,599 Memtable.java (line 162) 
 Completed flushing 
 /var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db 
 (15042 bytes)
  INFO

[jira] Updated: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1838:
---

Attachment: CASSANDRA-1838.patch

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli

2010-12-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970136#action_12970136
 ] 

Sylvain Lebresne commented on CASSANDRA-1838:
-

Pavel: it would be nice to actually show the ttl of a column when it has one 
(and by that I mean when you get a column)

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970137#action_12970137
 ] 

Pavel Yaskevich commented on CASSANDRA-1838:


Sure thing. Can you please post an example output here, how you see this? 

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli

2010-12-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970138#action_12970138
 ] 

Sylvain Lebresne commented on CASSANDRA-1838:
-

Nothing fancy, something like
{noformat}
[defa...@demo] get test2[row1]; 
 
= (column=col1, value=val1, timestamp=1291980736812000)
= (column=col2, value=val2, timestamp=1291980747061000,ttl=30)
{noformat}

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[Cassandra Wiki] Trivial Update of ThomasBoose by Tho masBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose page has been changed by ThomasBoose.
The comment on this change is: Start.
http://wiki.apache.org/cassandra/ThomasBoose

--

New page:
#format wiki
#language en
== Thomas Boose ==
Email: MailTo(thomas AT boose DOT nl) 
{{http://a0.twimg.com/profile_images/196967692/Thomas_reasonably_small.jpg}}

Hi, I'm contributing to this wiki as part of a assignment by my university. The 
asignment is to develop, design, build and implement a messaging system's 
database backend for realtime, location based content provided by schools, 
company's and individuals aim't at intrestgroups.

We have chosen cassandra for storage and now our task is to convert a 
relational model into Cassandra columnfamily's. Therefore I'd like to create a 
page describing how to implement specific EERD modeling parts to Cassandra. 
I've read that not everybody supports this idea but I'd like to try it anyway.

Comments and sugestions are alway's welcome. Language improvevents are 
appreciated as my native tonque is not english.


CategoryHomepage

[jira] Updated: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1838:
---

Attachment: CASSANDRA-1838-v2.patch

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838-v2.patch, CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1838:
---

Attachment: (was: CASSANDRA-1838-v2.patch)

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-1838:
---

Attachment: CASSANDRA-1838-v2.patch

GET, LIST will output 'TTL' if set.

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838-v2.patch, CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-1840) nodetool move caused the moved node to drop itself from 'nodetool ring' output; others think it's 'joining'


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-1840.
---

Resolution: Duplicate

believe this was fixed in CASSANDRA-1829

 nodetool move caused the moved node to drop itself from 'nodetool ring' 
 output; others think it's 'joining'
 ---

 Key: CASSANDRA-1840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1840
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Priority: Minor

 I have a test cluster with three nodes on a very recent 0.7 (last few days 
 branch). It has very little data in it (so maybe timing can be an issue given 
 how fast operations complete). It was otherwise healthy; nodetool ring was 
 consistent on all nodes and I had just run some compactions and 'repair' 
 commands on all nodes repeatedly.
 I had a single client doing some reads/writes of single columns; nothing 
 extreme (low load).
 When I did a 'nodetool move' the node exited the ring, stopped responding to 
 thrift RPC, entered the ring again, and started accepting RPC requests via 
 thrift. It reports in the log that it is joined.
 However, at this point 'nodetool ring' on the node I moved does *not* show 
 its own location in the ring, and other nodes show it as 'joining' (with the 
 new token, not the old token). I will include nodetool ring output and log 
 output below.
 The situation was un-wedged by restarting the node that I had moved. After it 
 started and a few seconds passed, nodetool ring looked correct on the node in 
 question and other nodes now reported it as 'up' rather than 'joining'.
 Moved node said post-move (.61 in the below pastes is the node that I moved):
 Address Status State   LoadOwnsToken  
  

 110288156320304836825416347816186393502 
 78.31.15.204Up Normal  224.34 KB   61.44%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 And the other two:
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  224.34 KB   29.31%  
 44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 Address Status State   LoadOwnsToken  
  

 164957594472845753490452447750528540018 
 78.31.15.204Up Normal  
 224.34 KB   29.31%  44678687293344048155696022135861768368  
 193.182.3.229   Up Normal  251.84 KB   38.56%  
 110288156320304836825416347816186393502 
 193.182.3.61Up Joining 194.76 KB   32.13%  
 164957594472845753490452447750528540018 
 I'll try reproducing a few times, and also merge latest 0.7.
 Here is some system log output from the node that got moved; it looks good to 
 me:
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:17,560 
 StorageService.java (line 455) Leaving: sleeping 3 ms for pending range 
 setup
  INFO [RMI TCP Connection(32)-193.182.3.61] 2010-12-10 09:38:47,564 
 StorageService.java (line 455) Leaving: streaming data to other nodes
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 75) 
 Beginning transfer to /78.31.15.204
  INFO [StreamStage:1] 2010-12-10 09:38:47,566 StreamOut.java (line 98) 
 Flushing memtables for KeyspaceSlask...
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 639) switching in a fresh Memtable for KeyValue at 
 CommitLogContext(file='/var/lib/spotify-cassandra/slask/commitlog/CommitLog-1291973418600.log',
  position=2573610)
  INFO [StreamStage:1] 2010-12-10 09:38:47,567 ColumnFamilyStore.java (line 
 943) Enqueuing flush of memtable-keyva...@1131602880(370711 bytes, 5533 
 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,567 Memtable.java (line 155) 
 Writing memtable-keyva...@1131602880(370711 bytes, 5533 operations)
  INFO [FlushWriter:1] 2010-12-10 09:38:47,599 Memtable.java (line 162) 
 Completed flushing 
 /var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-68-Data.db 
 (15042 bytes)
  INFO [StreamStage:1] 2010-12-10 09:38:47,601 StreamOut.java (line 171) 
 Stream context metadata 
 [/var/lib/spotify-cassandra/slask/data/KeyspaceSlask/KeyValue-e-67-Data.db/(0,10094)

[Cassandra Wiki] Update of ThomasBoose/EERD model compon ents to Cassandra Column family's by ThomasBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose/EERD model components to Cassandra Column family's page has 
been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s

--

New page:
##master-page:HomepageReadWritePageTemplate
##master-date:Unknown-Date
#format wiki
#language en
= A way to implement EERD components in Cassandra =
== Intro ==
This page describes model tranformations from EERD concepts into Cassandra 
ColumnFamily concepts. All input is welcome.

== DBMS layer ==
At several spots in this document you wil find suggestions to implement trivial 
DBMS functionality by hand. At this stage, I would suggest to programmers to 
implement at least 4 tiers when using cassandra as a backend server. One would 
be the database layer by cassandra itself, One would be a tier implementing 
DBMS rules, another for business rules finishing with an application tier.

In this DBMS tier functions should be available for keeping data consistend 
based on datarules and it would throw exceptions when indexes are changed or 
orders are given to delete key's agains DBMS rules.

If this is not yet making sence, read on.

== Indexing ==
In order to add an index to a column, other then the columnfamily's key, we 
should to create a second columnfamily. Every insert, which can be either an 
insert or update in cassandra, on the original columnfamily we will update the 
corresponding index.

Think of a columnfamily cf_Person (examples in Python using pycassa)

cf_Person.insert('234', {'name':'Karel','City:'Haarlem'})
cfi_Person_City.insert (Haarlem', {'234':''})

This way a hash will be created containing columns for every person's key that 
lives in a specific City. The ColumnFamily architecture of Cassandra can store 
a unlimited number of columns for each key. This meens that when deleting a 
person it reference in the cfi_Person_City index should be removed first. When 
updating a person, maybe moving to anothor City, we have to remove the element 
from the cfi_Person_City first and then store it with the corresponding new 
City.

== Relations ==
=== 1 on 1 ===
Typicly you'll find three kinds of 1 on 1 relations in a relational model. I 
will address them one at a time.

 Equal elements 
Sometimes all the elements are part of both collections on either side of the 
relationship. The reasons these collections are moddeled seperately are most 
often based on security issues or functional differences. One solution in a 
Cassandra database would be the same as you would implement such a relation in 
an RDBMS. Simply by sharing the same key in both columnfamilies. Inserting a 
key in one of these columnfamily's would insert the same in the other and vise 
versa. Updating an existing key in either columnfamily would not result in any 
change in the other. Deleting a key from one columnfamily will result in 
deleting the same key in the other family as well, providing this would be 
allowed.

''I'm not sure to what detaillevel security rules can apply in a Cassandra 
database. At least I know that one can creat logins per cluster.''

If it gets necessary to use different keys for both collections, sometimes it 
is not up to one designer to select both keys, although the number of element 
are equal and they are related one on one, in a relational model the designer 
gets to select either key to insert into the other collection with an unique 
and foreign key constraint.

In Cassandra modeling you are forced to either croslink both key's, So you 
design both key's foreign in both columnfamily's. Or you create a third 
columnfamily in which you store both keys preceded by a token to which 
columfamily you are refering. Lets focus on the first option. Say we hand out 
phones to our employees and we agree that every employee will always have one 
phone. and phones that are not used are not stored in our columnfamily. The 
phone has a phonenumber as key where the employee has a socialsecurity number. 
In order to know which number to dial when looking for employee X and who is 
calling giving a specific phonenumber we need to store both keys foreign in 
both columnfamily's.

-- CF_Employee -
| | name | phone  | salary |
| 123-12-1234 |John  | 0555-123456| 10.000 |

| | name | phone  | salary |
 | 321-21-4321 |Jane  | 0555-654321| 12.000 |
 

-- CF_Phone ---
 | | employee | credit |
| 0555-123456 | 123-12-1234  | 10 |
 ---
  | | employee | credit |
| 0555-654321 | 321-21-4321  | 5  |
 ---

Using a static columnname and requiring input in

[Cassandra Wiki] Trivial Update of ThomasBoose/EERD mode l components to Cassandra Column family's by ThomasBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for
change notification.

The ThomasBoose/EERD model components to Cassandra Column family's page has
been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diffrev1=1rev2=2

This page describes model tranformations from EERD concepts into Cassandra
ColumnFamily concepts. All input is welcome.

== DBMS layer ==
- At several spots in this document you wil find suggestions to implement
trivial DBMS functionality by hand. At this stage, I would suggest to
programmers to implement at least 4 tiers when using cassandra as a backend
server. One would be the database layer by cassandra itself, One would be a
tier implementing DBMS rules, another for business rules finishing with an
application tier.
+ At several spots in this document you wil find suggestions to implement
trivial DBMS functionality by hand. At this stage, I would suggest to
programmers to implement at least 4 tiers when using Cassandra as a backend
server. One would be the database layer by cassandra itself, One would be a
tier implementing DBMS rules, another for business rules finishing with an
application tier.

In this DBMS tier functions should be available for keeping data consistend
based on datarules and it would throw exceptions when indexes are changed or
orders are given to delete key's agains DBMS rules.

If this is not yet making sence, read on.

== Indexing ==
- In order to add an index to a column, other then the columnfamily's key, we
should to create a second columnfamily. Every insert, which can be either an
insert or update in cassandra, on the original columnfamily we will update the
corresponding index.
+ In order to add an index to a column, other then the ColumnFamily key, we
should to create a second ColumnFamily. Every insert, which can be either an
insert or update in Cassandra, on the original columnfamily we will update the
corresponding index.

- Think of a columnfamily cf_Person (examples in Python using pycassa)
+ Think of a ColumnFamily cf_Person (examples in Python using pycassa)

+ {{{
cf_Person.insert('234', {'name':'Karel','City:'Haarlem'})
- cfi_Person_City.insert (Haarlem', {'234':''})
+ cfi_Person_City.insert ('Haarlem', {'234':''})
-
+ }}}
- This way a hash will be created containing columns for every person's key
that lives in a specific City. The ColumnFamily architecture of Cassandra can
store a unlimited number of columns for each key. This meens that when deleting
a person it reference in the cfi_Person_City index should be removed first.
When updating a person, maybe moving to anothor City, we have to remove the
element from the cfi_Person_City first and then store it with the corresponding
new City.
+ This way a hash will be created containing columns for every person's key
that lives in a specific City. The ColumnFamily architecture of Cassandra can
store a unlimited number of columns for each key. This meens that when deleting
a person it's reference in the cfi_Person_City index should be removed first.
When updating a person, maybe moving to anothor City, we have to remove the
element from the cfi_Person_City first and then store it with the corresponding
new City.'' ''

== Relations ==
=== 1 on 1 ===
Typicly you'll find three kinds of 1 on 1 relations in a relational model. I
will address them one at a time.

Equal elements
- Sometimes all the elements are part of both collections on either side of the
relationship. The reasons these collections are moddeled seperately are most
often based on security issues or functional differences. One solution in a
Cassandra database would be the same as you would implement such a relation in
an RDBMS. Simply by sharing the same key in both columnfamilies. Inserting a
key in one of these columnfamily's would insert the same in the other and vise
versa. Updating an existing key in either columnfamily would not result in any
change in the other. Deleting a key from one columnfamily will result in
deleting the same key in the other family as well, providing this would be
allowed.
+ Sometimes all the elements are part of both collections on either side of the
relationship. The reasons these collections are moddeled seperately are most
often based on security issues or functional differences. One solution in a
Cassandra database would be the same as you would implement such a relation in
an RDBMS. Simply by sharing the same key in both ColumnFamily'ss. Inserting a
key in one of these ColumnFamily's would insert the same in the other and vise
versa. Updating an existing key in either ColumnFamily would not result in any
change in the other. Deleting a key from one ColumnFamily will result in
deleting the same key in the other family as well,

[Cassandra Wiki] Trivial Update of ThomasBoose/EERD mode l components to Cassandra Column family's by ThomasBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose/EERD model components to Cassandra Column family's page has 
been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diffrev1=2rev2=3

--

  Typicly you'll find three kinds of 1 on 1 relations in a relational model. I 
will address them one at a time.
  
   Equal elements 
- Sometimes all the elements are part of both collections on either side of the 
relationship. The reasons these collections are moddeled seperately are most 
often based on security issues or functional differences. One solution in a 
Cassandra database would be the same as you would implement such a relation in 
an RDBMS. Simply by sharing the same key in both ColumnFamily'ss. Inserting a 
key in one of these ColumnFamily's would insert the same in the other and vise 
versa. Updating an existing key in either ColumnFamily would not result in any 
change in the other. Deleting a key from one ColumnFamily will result in 
deleting the same key in the other family as well, providing this would be 
allowed.
+ Sometimes all the elements are part of both collections on either side of the 
relationship. The reasons these collections are moddeled seperately are most 
often based on security issues or functional differences. One solution in a 
Cassandra database would be the same as you would implement such a relation in 
an RDBMS. Simply by sharing the same key in both ColumnFamily's. Inserting a 
key in one of these ColumnFamily's would insert the same in the other and vise 
versa. Updating an existing key in either ColumnFamily would not result in any 
change in the other. Deleting a key from one ColumnFamily will result in 
deleting the same key in the other family as well, providing this would be 
allowed.
  
  ''I'm not sure to what detaillevel security rules can apply in a Cassandra 
database. At least I know that one can creat logins per cluster.''
  
- If it gets necessary to use different keys for both collections, sometimes it 
is not up to one designer to select both keys, although the number of element 
are equal and they are related one on one, in a relational model the designer 
gets to select either key to insert into the other collection with an unique 
and foreign key constraint.
+ If it is necessary to use different keys for both collections, sometimes it 
is not up to one designer to select both keys, although the number of element 
are equal and they are related one on one, in a relational model the designer 
gets to select either key to insert into the other collection with an unique 
and foreign key constraint.
  
- In Cassandra modeling you are forced to either croslink both key's, So you 
design both key's foreign in both columnfamily's. Or you create a third 
columnfamily in which you store both keys preceded by a token to which 
columfamily you are refering. Lets focus on the first option. Say we hand out 
phones to our employees and we agree that every employee will always have one 
phone. and phones that are not used are not stored in our columnfamily. The 
phone has a phonenumber as key where the employee has a socialsecurity number. 
In order to know which number to dial when looking for employee X and who is 
calling giving a specific phonenumber we need to store both keys foreign in 
both columnfamily's.
+ In Cassandra modeling you are forced to either croslink both key's, So you'd 
design both key's foreign in both ColumnFamily's. Or you create a third 
ColumnFamily in which you store both keys preceded by a token to which 
columfamily you are refering. Lets focus on the first option. Say we hand out 
phones to our employees and we agree that every employee will always have one 
phone. and phones that are not used are not stored in our database. The phone 
has a phonenumber as key where the employee has a social security number. In 
order to know which number to dial when looking for employee X and who is 
calling giving a specific phonenumber we need to store both keys foreign in 
both ColumnFamily's.
  
- -- CF_Employee
  
- 
- | | name | phone  | salary | | 123-12-1234 |John  | 
0555-123456| 10.000 |
+ ||tablewidth=400px'''CF_Employee'''||
+ |||2123-12-1234||name||phone||salary||
+ ||John||0555-123456||10.000||
+ |||2321-21-4321||name||phone||salary||
+ ||Jane||0555-654321||12.000||
  
- 
- | | name | phone  | salary |
  
-  * | 321-21-4321 |Jane  | 0555-654321| 12.000 |
+ ||tablewidth=400px tablealign=left'''CF_Phone'''||
+ |||20555-123456||employee||credit||
+ ||123-12-1234||10||
+ |||20555-654321||employee||credit||
+ ||321-21-4321||5||
  
- 
- -- CF_Phone
  
- 
-  * | | employee | credit |
  
- | 0555-123456 | 123-12-1234  | 10 |
  
-  *
- 
-  * |

[jira] Commented: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970189#action_12970189
 ] 

Jonathan Ellis commented on CASSANDRA-1083:
---

I think this approach wastes a lot more effort than the current system, because 
once it has been going a while you see this:

{code}
1 1 1 1 121 125 125 125 125 125 125 125 
Compacting (ages): 64 3 2 1 0 
125 125 125 125 125 125 125 125 
{code}

in other words, each time we do a compaction, the common case is for it to 
compact the most recent small ones with a large one, meaning 95% of the work 
done is just re-copying the large.

 Improvement to CompactionManger's submitMinorIfNeeded
 -

 Key: CASSANDRA-1083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 0.7.1

 Attachments: 1083-configurable-compaction-thresholds.patch, 
 compaction_simulation.rb, compaction_simulation.rb


 We've discovered that we are unable to tune compaction the way we want for 
 our production cluster. I think the current algorithm doesn't do this as well 
 as it could, since it doesn't sort the sstables by size before doing the 
 bucketing, which means the tuning parameters have unpredictable results.
 I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
 proposal:
 config operations:
  minimumCompactionThreshold
  maximumCompactionThreshold
  targetSSTableCount
 The first two would mean what they currently mean: the bounds on how many 
 sstables to compact in one compaction operation. The 3rd is a target for how 
 many SSTables you'd like to have.
 Pseudo code algorithm for determining whether or not to do a minor compaction:
 {noformat} 
 if sstables.length + minimumCompactionThreshold -1  targetSSTableCount
   sort sstables from smallest to largest
   compact the up to maximumCompactionThreshold smallest tables
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1838) Add ability to set TTL on columns in cassandra-cli

2010-12-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970194#action_12970194
 ] 

Sylvain Lebresne commented on CASSANDRA-1838:
-

+1

 Add ability to set TTL on columns in cassandra-cli
 --

 Key: CASSANDRA-1838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1838
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0 rc 1
 Environment: Ubuntu 10.04 64bit
Reporter: Eric Tamme
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.7.1

 Attachments: CASSANDRA-1838-v2.patch, CASSANDRA-1838.patch


 Currently the cassandra-cli does not have any mechanism to set the ttl 
 attribute of a column.  This would be a useful ability to have when working 
 with the cli tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1083:
--

Attachment: 1083-sort.txt

bq. [compaction] doesn't sort the sstables by size before doing the bucketing, 
which means the tuning parameters have unpredictable results

patch attached to fix this

 Improvement to CompactionManger's submitMinorIfNeeded
 -

 Key: CASSANDRA-1083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 0.7.1

 Attachments: 1083-configurable-compaction-thresholds.patch, 
 1083-sort.txt, compaction_simulation.rb, compaction_simulation.rb


 We've discovered that we are unable to tune compaction the way we want for 
 our production cluster. I think the current algorithm doesn't do this as well 
 as it could, since it doesn't sort the sstables by size before doing the 
 bucketing, which means the tuning parameters have unpredictable results.
 I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
 proposal:
 config operations:
  minimumCompactionThreshold
  maximumCompactionThreshold
  targetSSTableCount
 The first two would mean what they currently mean: the bounds on how many 
 sstables to compact in one compaction operation. The 3rd is a target for how 
 many SSTables you'd like to have.
 Pseudo code algorithm for determining whether or not to do a minor compaction:
 {noformat} 
 if sstables.length + minimumCompactionThreshold -1  targetSSTableCount
   sort sstables from smallest to largest
   compact the up to maximumCompactionThreshold smallest tables
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[Cassandra Wiki] Update of ThomasBoose/EERD model compon ents to Cassandra Column family's by ThomasBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose/EERD model components to Cassandra Column family's page has 
been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diffrev1=3rev2=4

--

  If it is necessary to use different keys for both collections, sometimes it 
is not up to one designer to select both keys, although the number of element 
are equal and they are related one on one, in a relational model the designer 
gets to select either key to insert into the other collection with an unique 
and foreign key constraint.
  
  In Cassandra modeling you are forced to either croslink both key's, So you'd 
design both key's foreign in both ColumnFamily's. Or you create a third 
ColumnFamily in which you store both keys preceded by a token to which 
columfamily you are refering. Lets focus on the first option. Say we hand out 
phones to our employees and we agree that every employee will always have one 
phone. and phones that are not used are not stored in our database. The phone 
has a phonenumber as key where the employee has a social security number. In 
order to know which number to dial when looking for employee X and who is 
calling giving a specific phonenumber we need to store both keys foreign in 
both ColumnFamily's.
- 
- 
- ||tablewidth=400px'''CF_Employee'''||
+ tablewidth=400px'''CF_Employee''' ||
- |||2123-12-1234||name||phone||salary||
+ ||style=text-align: center; |2123-12-1234 ||name ||phone ||salary ||
- ||John||0555-123456||10.000||
+ ||John ||0555-123456 ||10.000 ||
- |||2321-21-4321||name||phone||salary||
+ ||style=text-align: center; |2321-21-4321 ||name ||phone ||salary ||
- ||Jane||0555-654321||12.000||
+ ||Jane ||0555-654321 ||12.000 ||
- 
- 
- ||tablewidth=400px tablealign=left'''CF_Phone'''||
- |||20555-123456||employee||credit||
- ||123-12-1234||10||
- |||20555-654321||employee||credit||
- ||321-21-4321||5||
  
  
  
- 
+ ||tablewidth=400px tablestyle=text-align: left;style=text-align: 
center;'''CF_Phone''' ||
+ ||style=text-align: center; |20555-123456 ||employee ||credit ||
+ ||123-12-1234 ||10 ||
+ ||style=text-align: center; |20555-654321 ||employee ||credit ||
+ ||321-21-4321 ||5 ||
  
  
  
@@ -71, +66 @@

raise error or delete specified employee
  }}}
   Subset elements 
- '' ''
+ One on one relationships with one collection being smaller, in fact being a 
subset of the other collections in relational systems are solved by adding the 
key of the larger collection as foreign key to the smaler one. Preferably one 
uses the same key values as decribed above. This way we prevent null values 
that are not strictly indicating an unknown value. Null value's should only 
meen We know there is a value but the value is unknown as we've all learned 
in school.
  
+ As stated we prefer the foreign key to be the same value as the key from the 
superset ColumnFamily. In every other case we'll have to introduce logic to 
keep the relation cosistent. In any case you have to enforce the existance of 
all keys in the subset in the superset. Logic must also be provided when 
deleting elements from the superset with respect to the related element in the 
subset.
+ 
+  Overlap 
+ The easiest one on one relation to implement is the one in which elements in 
both collections do not need to be in the other but might. If at all possible 
create one big super ColumnFamily that collects all elements from both 
collections, even if there is no corresponding attribute (column). If absolutly 
neccessary you can provide keys from either ColumnFamily if the values are not 
the same but one on one related. See above for contraint considerations.
+ 
+ === 1 to Many ===
+ In one to many relationships we add the key from the one side foreign to 
the many side. So if we're modelinng students studing at only one school-unit 
at a time we would add the unit's key to the student as foreign. Considering 
that no foreign key logic is provided you will have to write your own code to 
enforce consistancy in unit's existing when the unit attribute of a student is 
set and defining behaviour when deleting a unit. Cosiddering the fact that this 
kind of relation is very common one could best create the logic for this at a 
seperate DBMS tier.
+ 
+ Every student has only one school-unit so we enforce one static name of a 
column that will reference this unit. for instance this column in the 
cf_Student columnfamily is called school-unit. In a cassandra database this 
is not sufficient to retrieve all student within this unit. One could find 
answers to questions like these but it would require quite a lot of processing 
power. If a ColumnFamily, the cf_School_unit family in this case, has only one 
of these relations, then one could chose to add all student

[jira] Created: (CASSANDRA-1841) cassandra-cli formatted help width

2010-12-10 Thread Eric Evans (JIRA)

cassandra-cli formatted help width
--

 Key: CASSANDRA-1841
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1841
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.0 rc 2
Reporter: Eric Evans
Priority: Trivial
 Fix For: 0.7.0


Most of cassandra-cli's help output justifies to 81 chars, just enough to cause 
line wrap on most default sized terminals.  It would improve appearance here if 
one character of the separating white space was removed so that it justified at 
80 chars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[Cassandra Wiki] Trivial Update of ThomasBoose by Tho masBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose page has been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose?action=diffrev1=1rev2=2

--

  == Thomas Boose ==
  Email: MailTo(thomas AT boose DOT nl) 
{{http://a0.twimg.com/profile_images/196967692/Thomas_reasonably_small.jpg}}
  
- Hi, I'm contributing to this wiki as part of a assignment by my university. 
The asignment is to develop, design, build and implement a messaging system's 
database backend for realtime, location based content provided by schools, 
company's and individuals aim't at intrestgroups.
+ Hi, I'm contributing to this wiki as part of a assignment by my university. 
The asignment is to develop, design, build and implement a messaging system's 
database backend for realtime, location based content provided by schools, 
company's and individuals aimed at intrestgroups.
  
- We have chosen cassandra for storage and now our task is to convert a 
relational model into Cassandra columnfamily's. Therefore I'd like to create a 
page describing how to implement specific EERD modeling parts to Cassandra. 
I've read that not everybody supports this idea but I'd like to try it anyway.
+ We have chosen Cassandra for storage and now our task is to convert a 
relational model into Cassandra ColumnFamily's. Therefore I'd like to create a 
page [[EERD model components to Cassandra Column family's|describing how to 
implement specific EERD modeling parts to Cassandra]]. I've read that not 
everybody supports this idea but I'd like to try it anyway.
  
- Comments and sugestions are alway's welcome. Language improvevents are 
appreciated as my native tonque is not english.
+ Comments and sugestions are alway's welcome. Language improvements are 
appreciated as my native tonque is not english.
  
  
  CategoryHomepage

[Cassandra Wiki] Trivial Update of ThomasBoose by Tho masBoose

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThomasBoose page has been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose?action=diffrev1=2rev2=3

--

  
  Hi, I'm contributing to this wiki as part of a assignment by my university. 
The asignment is to develop, design, build and implement a messaging system's 
database backend for realtime, location based content provided by schools, 
company's and individuals aimed at intrestgroups.
  
- We have chosen Cassandra for storage and now our task is to convert a 
relational model into Cassandra ColumnFamily's. Therefore I'd like to create a 
page [[EERD model components to Cassandra Column family's|describing how to 
implement specific EERD modeling parts to Cassandra]]. I've read that not 
everybody supports this idea but I'd like to try it anyway.
+ We have chosen Cassandra for storage and now our task is to convert a 
relational model into Cassandra ColumnFamily's. Therefore I'd like to create a 
page [[ThomasBoose/EERD model components to Cassandra Column 
family's|describing how to implement specific EERD modeling parts to 
Cassandra]]. I've read that not everybody supports this idea but I'd like to 
try it anyway.
  
  Comments and sugestions are alway's welcome. Language improvements are 
appreciated as my native tonque is not english.

svn commit: r1044422 - in /cassandra/branches/cassandra-0.7: ./ src/java/org/apache/cassandra/cli/ test/unit/org/apache/cassandra/cli/

2010-12-10 Thread jbellis

Author: jbellis
Date: Fri Dec 10 16:29:37 2010
New Revision: 1044422

URL: http://svn.apache.org/viewvc?rev=1044422view=rev
Log:
add TTL support to CLI
patch by Pavel Yaskevich; reviewed by Sylvain Lebresne for CASSANDRA-1838

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/Cli.g

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliUserHelp.java

cassandra/branches/cassandra-0.7/test/unit/org/apache/cassandra/cli/CliTest.java

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1044422r1=1044421r2=1044422view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Fri Dec 10 16:29:37 2010
@@ -4,6 +4,7 @@ dev
  * infer org.apache.cassandra.locator for replication strategy classes
when not otherwise specified
  * validation that generates less garbage (CASSANDRA-1814)
+ * add TTL support to CLI (CASSANDRA-1838)
 
 
 0.7.0-rc2

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/Cli.g
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/Cli.g?rev=1044422r1=1044421r2=1044422view=diff
==
--- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/Cli.g 
(original)
+++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/Cli.g 
Fri Dec 10 16:29:37 2010
@@ -240,8 +240,8 @@ typeIdentifier
 ;
 
 setStatement
-: SET columnFamilyExpr '=' value 
-- ^(NODE_THRIFT_SET columnFamilyExpr value)
+: SET columnFamilyExpr '=' objectValue=value (WITH TTL '=' ttlValue=value)?
+- ^(NODE_THRIFT_SET columnFamilyExpr $objectValue ( $ttlValue )?)
 ;
 
 countStatement
@@ -525,6 +525,7 @@ LIST:   'LIST';
 LIMIT:  'LIMIT';
 TRUNCATE:   'TRUNCATE';
 ASSUME: 'ASSUME';
+TTL:'TTL';
 
 IP_ADDRESS 
 : IntegerLiteral '.' IntegerLiteral '.' IntegerLiteral '.' IntegerLiteral

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java?rev=1044422r1=1044421r2=1044422view=diff
==
--- 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
 (original)
+++ 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
 Fri Dec 10 16:29:37 2010
@@ -319,8 +319,9 @@ public class CliClient extends CliUserHe
 for (Column col : superColumn.getColumns())
 {
 validator = getValidatorForValue(cfDef, col.getName());
-sessionState.out.printf(%n (column=%s, value=%s, 
timestamp=%d), formatSubcolumnName(keyspace, columnFamily, col),
-validator.getString(col.value), 
col.timestamp);
+sessionState.out.printf(%n (column=%s, value=%s, 
timestamp=%d%s), formatSubcolumnName(keyspace, columnFamily, col),
+
validator.getString(col.value), col.timestamp,
+col.isSetTtl() ? 
String.format(, ttl=%d, col.getTtl()) : );
 }
 
 sessionState.out.println());
@@ -329,8 +330,9 @@ public class CliClient extends CliUserHe
 {
 Column column = cosc.column;
 validator = getValidatorForValue(cfDef, column.getName());
-sessionState.out.printf(= (column=%s, value=%s, 
timestamp=%d)%n, formatColumnName(keyspace, columnFamily, column),
-validator.getString(column.value), 
column.timestamp);
+sessionState.out.printf(= (column=%s, value=%s, 
timestamp=%d%s)%n, formatColumnName(keyspace, columnFamily, column),
+
validator.getString(column.value), column.timestamp,
+column.isSetTtl() ? 
String.format(, ttl=%d, column.getTtl()) : );
 }
 }
 
@@ -453,8 +455,9 @@ public class CliClient extends CliUserHe
 }
 
 // print results
-sessionState.out.printf(= (column=%s, value=%s, timestamp=%d)%n,
-formatColumnName(keySpace, columnFamily, 
column), valueAsString, column.timestamp);
+sessionState.out.printf(= (column=%s, value=%s, timestamp=%d%s)%n,
+formatColumnName(keySpace, columnFamily,

[jira] Commented: (CASSANDRA-1408) nodetool drain attempts to delete a deleted file


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970221#action_12970221
 ] 

Jonathan Ellis commented on CASSANDRA-1408:
---

+1

 nodetool drain attempts to delete a deleted file
 

 Key: CASSANDRA-1408
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1408
 Project: Cassandra
  Issue Type: Bug
 Environment: sun-jdk-1.6/Ubuntu 10.04
Reporter: Jon Hermes
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.6.9, 0.7 beta 2

 Attachments: 1408-0.6.txt, 1408.txt


 Running `nodetool drain` presented me with a pretty stack-trace.
 The drain itself finished successfully and nothing showed up in the 
 system.log.
 {noformat}
 $ bin/nodetool -h 127.0.0.1 -p 8080 drain
 Exception in thread main java.lang.AssertionError: attempted to delete 
 non-existing file CommitLog-1282166457787.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:40)
   at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:178)
   at 
 org.apache.cassandra.service.StorageService.drain(StorageService.java:1653)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
   at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
   at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
   at sun.rmi.transport.Transport$1.run(Transport.java:159)
   at java.security.AccessController.doPrivileged(Native Method)
   at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
   at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

svn commit: r1044424 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java

2010-12-10 Thread jbellis

Author: jbellis
Date: Fri Dec 10 16:37:01 2010
New Revision: 1044424

URL: http://svn.apache.org/viewvc?rev=1044424view=rev
Log:
cli defaults to bytestype for subcomparator when creating CFS
patch by Pavel Yaskevich; reviewed by jbellis for CASSANDRA-1835

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1044424r1=1044423r2=1044424view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Fri Dec 10 16:37:01 2010
@@ -5,6 +5,8 @@ dev
when not otherwise specified
  * validation that generates less garbage (CASSANDRA-1814)
  * add TTL support to CLI (CASSANDRA-1838)
+ * cli defaults to bytestype for subcomparator when creating
+   column families (CASSANDRA-1835)
 
 
 0.7.0-rc2

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java?rev=1044424r1=1044423r2=1044424view=diff
==
--- 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
 (original)
+++ 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliClient.java
 Fri Dec 10 16:37:01 2010
@@ -1544,6 +1544,13 @@ public class CliClient extends CliUserHe
 private ByteBuffer subColumnNameAsBytes(String superColumn, CfDef 
columnFamilyDef) 
 {
 String comparatorClass = columnFamilyDef.subcomparator_type;
+
+if (comparatorClass == null)
+{
+sessionState.out.println(String.format(Notice: defaulting to 
BytesType subcomparator for '%s', columnFamilyDef.getName()));
+comparatorClass = BytesType;
+}
+
 return getBytesAccordingToType(superColumn, 
getFormatTypeForColumn(comparatorClass));   
 }

[jira] Commented: (CASSANDRA-1470) use direct io for compaction


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970225#action_12970225
 ] 

T Jake Luciani commented on CASSANDRA-1470:
---

.bq see http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html for 
why posix_fadvise won't work [for writes]

This article is talking about NOREUSE flag being a no-op but we are using 
DONTNEED which does work.


Since the true goal of this ticket is to minimize the performance impact of 
compaction I'd like to try the following:

At BRAF level:
   * use fadvise(DONTNEED) instead of direct-io for writes. This will fix the 
buffering problem we now see affecting write speed.  
   * use fadvise(DONTNEED) for sstable reads to remove the need for directio 
flag altogether.
   * add a method long[] pagesInPageCache()  which uses the posix mincore() 
function to detect the offsets of pages for this file currently in page cache.

At Compaction level(a separate ticket):
   * add getActiveKeys() which uses underlying pagesInPageCache() to get the 
keys actually in the page cache
   * use getActiveKeys() to detect which SSTables being compacted are in the os 
cache and make sure the subsequent pages in the new compacted SSTable are kept 
in the page cache for these keys. This will minimize the impact of compacting a 
hot SSTable as well. 

A simpler yet similar approach is described here:   
http://insights.oetiker.ch/linux/fadvise/

 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1839) Keep a tombstone cache


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1839:
--

  Component/s: Core
 Priority: Minor  (was: Major)
Fix Version/s: (was: 0.7.1)

note that this requires doing a full-row read on writes of row-level 
tombstones, otherwise there is a potential bug of assuming columns are 
supressed when they actually have a higher timestamp than the tombstone you are 
writing.

i'm skeptical that this is worth doing for a single use case.

 Keep a tombstone cache
 --

 Key: CASSANDRA-1839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1839
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.3
Reporter: Brandon Williams
Priority: Minor

 There is a use case in production where the pattern is read-then-delete, 
 where most of the keys read will not exist, but be attempted many times.  If 
 the key has never existed, the bloom filter makes this operation cheap, 
 however if the key has existed, especially if it has been overwritten many 
 times and thus spans multiple SSTables, the merge-on-read just to end up with 
 a tombstone can be expensive.  This can be mitigated with keycache and some 
 rowcache currently, but this can be further optimized by storing a sentinel 
 value in the keycache indicating that it's a tombstone, which we can 
 invalidate on new writes to the row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970230#action_12970230
 ] 

Jonathan Ellis commented on CASSANDRA-1470:
---

bq. This article is talking about NOREUSE flag being a no-op but we are using 
DONTNEED which does work

Peter wrote a book earlier in the ticket about DONTNEED -- it sounds like it 
could work but once you handle all the corner cases it may not actually be 
simpler than direct i/o.  I'm open to giving it a try, though.

bq. add a method long[] pagesInPageCache() which uses the posix mincore() 
function to detect the offsets of pages for this file currently in page cache

bq. use getActiveKeys() to detect which SSTables being compacted are in the os 
cache and make sure the subsequent pages in the new compacted SSTable are kept 
in the page cache for these keys

Let's avoid growing the scope of this ticket and make a new one for the 
pre-heat sstables post-compaction feature.

 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction

[
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970232#action_12970232
]

Peter Schuller commented on CASSANDRA-1470:
---

@jake:

Pretty good idea to combine the two like this. It especially works if the new
pages written can get intelligently pulled in (or rather not dropped).

A few things:

(1) In order for DONTNEED to be effective you have to fsync() (well, fdatasync
on Linux()) first. This will have similar performance implications as direct
I/O (see my long post earlier on in this ticket too), but at least removes the
need to carefully ensure writes happen in chunks (but instead fsync() frequency
will have to be considered and traded).

(2) Remember that DONTNEED will affect the data globally for the system;
meaning that a compaction that reads and does DONTNEED will actively active
data from sstables being actively used. (Again see my longer post earlier in
this issue). So you'd have to use mincore() when reading too in order to avoid
evicting actively used data. (Note: Not doing so may be *worse* than current
behavior, in addition to not causing an improvement, so I think this is
important.)

But given that those are eventually addressed it seems mincore+advise seems
like a pretty good combination.

One issue I can think of is that while mincore() gives you information in bulk
for many pages, posix_fadvise() does not allow the equivalent. So we'd expect
potentially quite a large number of posix_fadvise() calls assuming in-core data
is scattered across a large file. That might be significant in some cases (e.g.
if half of pages are in core, you may end up approaching a posix_fadvise() per
page read).

use direct io for compaction

Key: CASSANDRA-1470
URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
Fix For: 0.7.1

Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch,
CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch,
CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch,
CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch,
CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch,
CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch,
CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch,
CASSANDRA-1470.patch,
use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch

When compaction scans through a group of sstables, it forces the data in the
os buffer cache being used for hot reads, which can have a dramatic negative
effect on performance.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction

[
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970240#action_12970240
]

T Jake Luciani commented on CASSANDRA-1470:
---

@peter

re (1) in the v12 patch we already split the buffer flush() from sync() so I
would imagine it working like this but adding fadvise to the sync call. That
way a number writes would be cached then fsync'd and discarded.

re (2) from what I can find DONTNEED is global but it can be reset by another
read (from another fh). Also, DONTNEED doesn't discard the pages instantly but
just tells the kernel to mark them as ready to be thrown if it wants. So if the
data truly is hot then a subsequent read on this file will keep them in the
cache. I need to test and see if this is the case. If not then it would need
to work like you describe.

Also posix_fadvise will accept a range so you can specify a contiguous chunk.
if the cached pages truly are randomly scattered then you are right. It would
maybe then be better to overcommit the cache in this case?

use direct io for compaction

When compaction scans through a group of sstables, it forces the data in the
os buffer cache being used for hot reads, which can have a dramatic negative
effect on performance.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (CASSANDRA-1842) ColumnFamilyOutputFormat only writes the first column

2010-12-10 Thread Brandon Williams (JIRA)

ColumnFamilyOutputFormat only writes the first column
-

 Key: CASSANDRA-1842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1842
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.0 rc 1
Reporter: Brandon Williams
 Fix For: 0.7.0


In CASSANDRA-1774 we fixed a problem where only the last column was being 
written.  However, it appears that we only write the first one now.  This is 
easy to observe in the word count example:

{noformat}
RowKey: text2
= (column=word1, value=1, timestamp=1291666461685000)
{noformat}

is what the output for text2 looks like, but there should be another column, 
word2.  If the word count is run without CFOF it works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-1842) ColumnFamilyOutputFormat only writes the first column

2010-12-10 Thread Jeremy Hanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reassigned CASSANDRA-1842:
---

Assignee: Jeremy Hanna

 ColumnFamilyOutputFormat only writes the first column
 -

 Key: CASSANDRA-1842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1842
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.0 rc 1
Reporter: Brandon Williams
Assignee: Jeremy Hanna
 Fix For: 0.7.0


 In CASSANDRA-1774 we fixed a problem where only the last column was being 
 written.  However, it appears that we only write the first one now.  This is 
 easy to observe in the word count example:
 {noformat}
 RowKey: text2
 = (column=word1, value=1, timestamp=1291666461685000)
 {noformat}
 is what the output for text2 looks like, but there should be another column, 
 word2.  If the word count is run without CFOF it works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

svn commit: r1044450 - in /cassandra/branches/cassandra-0.6: CHANGES.txt src/java/org/apache/cassandra/db/ColumnFamilyStore.java src/java/org/apache/cassandra/service/StorageService.java

2010-12-10 Thread brandonwilliams

Author: brandonwilliams
Date: Fri Dec 10 17:35:14 2010
New Revision: 1044450

URL: http://svn.apache.org/viewvc?rev=1044450view=rev
Log:
correct ordering of drain operations so CL.recover is no longer necessary.  
Patch by jbellis and brandonwilliams, reviewed by jbellis for CASSANDRA-1408

Modified:
cassandra/branches/cassandra-0.6/CHANGES.txt

cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java

cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/service/StorageService.java

Modified: cassandra/branches/cassandra-0.6/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt?rev=1044450r1=109r2=1044450view=diff
==
--- cassandra/branches/cassandra-0.6/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.6/CHANGES.txt Fri Dec 10 17:35:14 2010
@@ -16,6 +16,8 @@
  * reduce fat client timeout (CASSANDRA-1730)
  * cleanup smallest CFs first to increase free temp space for larger ones
(CASSANDRA-1811)
+ * correct ordering of drain operations so CL.recover is no longer necessary
+   (CASSANDRA-1408)
 
 
 0.6.8

Modified: 
cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java?rev=1044450r1=109r2=1044450view=diff
==
--- 
cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
 (original)
+++ 
cassandra/branches/cassandra-0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
 Fri Dec 10 17:35:14 2010
@@ -71,27 +71,27 @@ public class ColumnFamilyStore implement
  * For BinaryMemtable that's about all that happens.  For live Memtables 
there are two other things
  * that switchMemtable does (which should be the only caller of 
submitFlush in this case).
  * First, it puts the Memtable into memtablesPendingFlush, where it stays 
until the flush is complete
- * and it's been added as an SSTableReader to ssTables_.  Second, it adds 
an entry to commitLogUpdater
+ * and it's been added as an SSTableReader to ssTables_.  Second, it adds 
an entry to postFlushExecutor
  * that waits for the flush to complete, then calls onMemtableFlush.  This 
allows multiple flushes
  * to happen simultaneously on multicore systems, while still calling onMF 
in the correct order,
  * which is necessary for replay in case of a restart since CommitLog 
assumes that when onMF is
  * called, all data up to the given context has been persisted to SSTables.
  */
-private static ExecutorService flushSorter_
+private static final ExecutorService flushSorter
 = new JMXEnabledThreadPoolExecutor(1,

Runtime.getRuntime().availableProcessors(),
Integer.MAX_VALUE,
TimeUnit.SECONDS,
new 
LinkedBlockingQueueRunnable(Runtime.getRuntime().availableProcessors()),
new 
NamedThreadFactory(FLUSH-SORTER-POOL));
-private static ExecutorService flushWriter_
+private static final ExecutorService flushWriter
 = new JMXEnabledThreadPoolExecutor(1,

DatabaseDescriptor.getAllDataFileLocations().length,
Integer.MAX_VALUE,
TimeUnit.SECONDS,
new 
LinkedBlockingQueueRunnable(DatabaseDescriptor.getAllDataFileLocations().length),
new 
NamedThreadFactory(FLUSH-WRITER-POOL));
-private static ExecutorService commitLogUpdater_ = new 
JMXEnabledThreadPoolExecutor(MEMTABLE-POST-FLUSHER);
+public static final ExecutorService postFlushExecutor = new 
JMXEnabledThreadPoolExecutor(MEMTABLE-POST-FLUSHER);
 
 private static final int KEY_RANGE_FILE_BUFFER_SIZE = 256 * 1024;
 
@@ -480,7 +480,7 @@ public class ColumnFamilyStore implement
 memtable_ = new Memtable(this);
 // a second executor that makes sure the onMemtableFlushes get 
called in the right order,
 // while keeping the wait-for-flush (future.get) out of anything 
latency-sensitive.
-return commitLogUpdater_.submit(new WrappedRunnable()
+return postFlushExecutor.submit(new WrappedRunnable()
 {
 public void runMayThrow() throws InterruptedException, 
IOException
 {
@@ -747,7 +747,7 @@ public class ColumnFamilyStore implement
 {
 logger_.info(Enqueuing flush of  + flushable);
 final Condition

[jira] Resolved: (CASSANDRA-1408) nodetool drain attempts to delete a deleted file

2010-12-10 Thread Brandon Williams (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams resolved CASSANDRA-1408.
-

Resolution: Fixed

Committed.

 nodetool drain attempts to delete a deleted file
 

 Key: CASSANDRA-1408
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1408
 Project: Cassandra
  Issue Type: Bug
 Environment: sun-jdk-1.6/Ubuntu 10.04
Reporter: Jon Hermes
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.6.9, 0.7 beta 2

 Attachments: 1408-0.6.txt, 1408.txt


 Running `nodetool drain` presented me with a pretty stack-trace.
 The drain itself finished successfully and nothing showed up in the 
 system.log.
 {noformat}
 $ bin/nodetool -h 127.0.0.1 -p 8080 drain
 Exception in thread main java.lang.AssertionError: attempted to delete 
 non-existing file CommitLog-1282166457787.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:40)
   at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:178)
   at 
 org.apache.cassandra.service.StorageService.drain(StorageService.java:1653)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
   at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
   at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
   at sun.rmi.transport.Transport$1.run(Transport.java:159)
   at java.security.AccessController.doPrivileged(Native Method)
   at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
   at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded

2010-12-10 Thread Ryan King (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970244#action_12970244
 ] 

Ryan King commented on CASSANDRA-1083:
--

I agree. I think this idea is mostly a dead end because its attacking the 
problem from the wrong direction.

 Improvement to CompactionManger's submitMinorIfNeeded
 -

 Key: CASSANDRA-1083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 0.7.1

 Attachments: 1083-configurable-compaction-thresholds.patch, 
 1083-sort.txt, compaction_simulation.rb, compaction_simulation.rb


 We've discovered that we are unable to tune compaction the way we want for 
 our production cluster. I think the current algorithm doesn't do this as well 
 as it could, since it doesn't sort the sstables by size before doing the 
 bucketing, which means the tuning parameters have unpredictable results.
 I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
 proposal:
 config operations:
  minimumCompactionThreshold
  maximumCompactionThreshold
  targetSSTableCount
 The first two would mean what they currently mean: the bounds on how many 
 sstables to compact in one compaction operation. The 3rd is a target for how 
 many SSTables you'd like to have.
 Pseudo code algorithm for determining whether or not to do a minor compaction:
 {noformat} 
 if sstables.length + minimumCompactionThreshold -1  targetSSTableCount
   sort sstables from smallest to largest
   compact the up to maximumCompactionThreshold smallest tables
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-1381) standard counters (support: incr + decr)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-1381.
---

Resolution: Duplicate
  Assignee: (was: Kelvin Kakugawa)

thanks

 standard counters (support: incr + decr)
 

 Key: CASSANDRA-1381
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1381
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Kelvin Kakugawa
 Attachments: CASSANDRA-1381.patch


 allow addition and subtraction.
 context tuple format:
 (node id, big-endian signed long, # of ops)
 reconciliation strategy:
 local: sum counts and # of ops
 remote: take counts based on highest # of ops

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970249#action_12970249
 ] 

Peter Schuller commented on CASSANDRA-1470:
---

(1) - great

(2) - i'm pretty sure it will get instantly evicted.  See 
http://lxr.free-electrons.com/source/mm/fadvise.c#L118 and 
http://lxr.free-electrons.com/source/mm/truncate.c#L309 (however I agree that 
with the mythical good enough implementation the hint would really just be 
that - a hint - but that can easily backfire; sometimes you want instant 
eviction; in reality I think that posix_fadvise() is too limited an interface 
and while you can imagine an implementation that does something correctly for a 
particular use-case, it's too limited to be generally suitable for everyone...).

On posix_fadvise: Yes, I was only thinking of scattered pages as a problem. 
Contiguous ranges are fine and what one wants for fadvise purposes.

On overcommitting: Certainly mincore+advise with fallback to overcommit would 
be an improvement still, but my gut feeling is that lots of real-life cases 
will definitely have very scattered hotness. Pretty much any use-case where row 
keys are spread randomly with respect to hotness (which I believe is very often 
the case), and each row is pretty small.

I'm trying to think when one would expect it not to be pretty scattered. I 
suppose if using OPP and the row keys correspond directly to something which is 
correlated with hotness? So I guess something like time series data with OPP, 
or with RP and large rows. But it feels like a pretty narrow subset of use 
cases.

It is worth noting that for truly large data sets scattering is fine since the 
cost of fadvise() per page read is still low since the contiguous ranges to 
drop will be fairly large. But unfortunately a lot of use cases, I assume, 
are with data that is either similar to memory size or a few factors of memory 
size (significantly smaller than memory is a non-issue since it's all in memory 
anyway with the current code).

(As an aside, and this is not a serious suggestion since Cassandra isn't in the 
business of delivering kernel patches, but the implementation seems to iterate 
over individual pages anyway. So it seems that the only thing preventing a more 
efficient fadvise() for discontiguous ranges is the interface to the kernel, 
rather than an implementation problem. At least based on a very brief look...)

 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970253#action_12970253
 ] 

Peter Schuller commented on CASSANDRA-1470:
---

Oh, and one more thing: Just to bring it up again since we're back to direct 
i/o vs. posix_fadvise(): Be aware that posix_fadvise() is not truly portable in 
practice even though it's POSIX. We have already established that DONTNEED is 
the only one implemented on Linux. On FreeBSD it doesn't seem to exist at all. 
Googling indicates it might exist in Solaris (but I have no idea to what extent 
or how it is implemented without checking thorougly).

This is not necessarily a great argument against it, as long as the use of it 
is optional and Cassandra still runs without it, but at least something to not 
loose sight of.

What about Windows, anyone?


 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970254#action_12970254
 ] 

T Jake Luciani commented on CASSANDRA-1470:
---

re (2) just following the code it looks like it eventually hits 
http://lxr.free-electrons.com/source/mm/swap.c#L329 which is doing a reference 
count so if another file has the page cached then it should stick around (again 
I'll need to test) but that would be great!

 use direct io for compaction
 

 Key: CASSANDRA-1470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 0.7.1

 Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
 CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
 CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
 CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
 CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
 CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
 CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
 CASSANDRA-1470.patch, 
 use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch


 When compaction scans through a group of sstables, it forces the data in the 
 os buffer cache being used for hot reads, which can have a dramatic negative 
 effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction