[jira] [Commented] (CASSANDRA-3228) Add new range scan with clock

2011-09-20 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109100#comment-13109100
 ] 

Ryan King commented on CASSANDRA-3228:
--

 the ruby gem, which is actively maintained as these things go, STILL does not 
 have 2ary index query support

Not true: https://github.com/fauna/cassandra/pull/59

 Add new range scan with clock
 -

 Key: CASSANDRA-3228
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3228
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.8.5
Reporter: Todd Nine
Priority: Minor

 Currently, it is not possible to specify minimum clock time on columns when 
 performing range scans.  In some situations, such as custom migration or 
 batch processing, it would be helpful to allow the client to specify a 
 minimum clock time.  This would only return columns with a clock value = the 
 specified. 
 I.E
 range scan (rowKey, startVal, endVal, revered, min clock)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3199) Counter write protocol: have the coordinator (instead of first replica) waits for replica responses directly

2011-09-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103760#comment-13103760
 ] 

Ryan King commented on CASSANDRA-3199:
--

This may be out of scope for this ticket, but can we differentiate between 
exceptions in writing to the first replica and in replicating to the others? 
That might help us do some limited forms of retries with counters.

 Counter write protocol: have the coordinator (instead of first replica) waits 
 for replica responses directly
 

 Key: CASSANDRA-3199
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3199
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: counters

 Current counter write protocol is this (where we take the case of write 
 coordinator != first replica):
   # coordinator forward write request to first replica
   # first replica write locally and replicate to other replica
   # first replica waits for enough answers from the other replica to satisfy 
 the consistency level
   # first replica acks the coordinator that completes the write to the client
 This ticket proposes to modify this protocol to:
   # coordinator forward write request to first replica
   # first replica write locally, acks the coordinator for its own write and 
 replicate to other replica
   # other replica respond directly to coordinator
   # once coordinator has enough responses, it completes the write
 I see 2 advantages to this new protocol:
   * it should be at tad faster since it parallelizes wire transfer better
   * it woud make TimeoutException a bit less likely and more importantly, a 
 TimeoutException would much more likely mean that the write hasn't been 
 persisted. Indeed, in the current protocol, once the first replica has send 
 the write to the other replica, it has to wait for the replica answers and 
 answer the coordinator. If it dies during that time, we will return a 
 TimeoutException, even though the first replica died after having done it's 
 main job.
 The cons is that this adds a bit of complexity. In particular, the other 
 replica would have to answer to the coordinator for a query that has been 
 issued by the first replica.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3151) CLI documentation should explain how to create column families with CompositeType's

2011-09-07 Thread Ryan King (JIRA)
CLI documentation should explain how to create column families with 
CompositeType's
---

 Key: CASSANDRA-3151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Priority: Minor




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3096) Test RoundRobinScheduler timeouts

2011-08-31 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094728#comment-13094728
 ] 

Ryan King commented on CASSANDRA-3096:
--

looks good, +1

 Test RoundRobinScheduler timeouts
 -

 Key: CASSANDRA-3096
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3096
 Project: Cassandra
  Issue Type: Bug
  Components: API
Reporter: Stu Hood
Assignee: Stu Hood
 Fix For: 1.0

 Attachments: 
 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt


 CASSANDRA-3079 was very hasty, and introduced two bugs that would: 1) cause 
 the scheduler to busywait after a timeout, 2) never actually throw timeouts. 
 This calls for a test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-08-22 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088736#comment-13088736
 ] 

Ryan King commented on CASSANDRA-2319:
--

I haven't followed that ticket closely, but I think the answer is yes. For wide 
row use cases this patch lets you eliminate SStables with only the info in the 
index (because we know what range(s) of columns for a row are in that file).

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
  Labels: compression, index, timeseries
 Fix For: 1.0

 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2478) Custom protocol/transport

2011-08-22 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088918#comment-13088918
 ] 

Ryan King commented on CASSANDRA-2478:
--

If you want to use netty, I'd suggest considering using finagle on top of it: 
http://github.com/twitter/finagle. Its written in scala but its very easy to 
use from java.

 Custom protocol/transport
 -

 Key: CASSANDRA-2478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2478
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Eric Evans
Priority: Minor

 A custom wire protocol would give us the flexibility to optimize for our 
 specific use-cases, and eliminate a troublesome dependency (I'm referring to 
 Thrift, but none of the others would be significantly better).  Additionally, 
 RPC is bad fit here, and we'd do better to move in the direction of something 
 that natively supports streaming.
 I don't think this is as daunting as it might seem initially.  Utilizing an 
 existing server framework like Netty, combined with some copy-and-paste of 
 bits from other FLOSS projects would probably get us 80% of the way there.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2478) Custom protocol/transport

2011-08-22 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089254#comment-13089254
 ] 

Ryan King commented on CASSANDRA-2478:
--

Finagle is a library for building protocols that *happens* to come with a few 
built-in implementations (http, memcached, thrift, etc). It solves a lot of 
problems that you'd have to re-build on top of netty.

 Custom protocol/transport
 -

 Key: CASSANDRA-2478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2478
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Eric Evans
Priority: Minor

 A custom wire protocol would give us the flexibility to optimize for our 
 specific use-cases, and eliminate a troublesome dependency (I'm referring to 
 Thrift, but none of the others would be significantly better).  Additionally, 
 RPC is bad fit here, and we'd do better to move in the direction of something 
 that natively supports streaming.
 I don't think this is as daunting as it might seem initially.  Utilizing an 
 existing server framework like Netty, combined with some copy-and-paste of 
 bits from other FLOSS projects would probably get us 80% of the way there.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2500) Ruby dbi client (for CQL) that conforms to AR:ConnectionAdapter

2011-08-15 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085176#comment-13085176
 ] 

Ryan King commented on CASSANDRA-2500:
--

What would we need to change in fauna/cassandra?

 Ruby dbi client (for CQL) that conforms to AR:ConnectionAdapter
 ---

 Key: CASSANDRA-2500
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2500
 Project: Cassandra
  Issue Type: Task
  Components: API
Reporter: Jon Hermes
Assignee: Kelley Reynolds
  Labels: cql
 Fix For: 0.8.5

 Attachments: 2500.txt, genthriftrb.txt, rbcql-0.0.0.tgz


 Create a ruby driver for CQL.
 Lacking something standard (such as py-dbapi), going with something common 
 instead -- RoR ActiveRecord Connection Adapter 
 (http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/AbstractAdapter.html).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3019) log the keyspace and CF of large rows being compacted

2011-08-11 Thread Ryan King (JIRA)
log the keyspace and CF of large rows being compacted
-

 Key: CASSANDRA-3019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor


If you want to find the large rows it'd help to know the Keyspace and CF to 
look in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3019) log the keyspace and CF of large rows being compacted

2011-08-11 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-3019:
-

Attachment: 0001-add-keyspace-and-cf-to-large-row-compaction-logging.patch

 log the keyspace and CF of large rows being compacted
 -

 Key: CASSANDRA-3019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3019
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Attachments: 
 0001-add-keyspace-and-cf-to-large-row-compaction-logging.patch


 If you want to find the large rows it'd help to know the Keyspace and CF to 
 look in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3017) add a Message size limit

2011-08-11 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083881#comment-13083881
 ] 

Ryan King commented on CASSANDRA-3017:
--

I think fatal errors are what we're trying to avoid here. The biggest threat is 
probably malicious, not accidental (since you need to get the MAGIC and headers 
in before this length).

 add a Message size limit
 

 Key: CASSANDRA-3017
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3017
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ryan King
Priority: Minor
  Labels: lhf
 Attachments: 
 0001-use-the-thrift-max-message-size-for-inter-node-messa.patch


 We protect the server from allocating huge buffers for malformed message with 
 the Thrift frame size (CASSANDRA-475).  But we don't have similar protection 
 for the inter-node Message objects.
 Adding this would be good to deal with malicious adversaries as well as a 
 malfunctioning cluster participant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-05 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080008#comment-13080008
 ] 

Ryan King commented on CASSANDRA-2915:
--

Regarding realtime search, hasn't our (twitter's) realtime search branch been 
merged into lucene trunk? Whenever that's available we should get real realtime 
results.

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index
 Fix For: 1.0


 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-04 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079449#comment-13079449
 ] 

Ryan King commented on CASSANDRA-1717:
--

I think checksums per column would be way too much overhead. We already add a 
lot of overhead to all data stored in Cassandra, we should be careful about 
adding more.

 Cassandra cannot detect corrupt-but-readable column data
 

 Key: CASSANDRA-1717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: checksums.txt


 Most corruptions of on-disk data due to bitrot render the column (or row) 
 unreadable, so the data can be replaced by read repair or anti-entropy.  But 
 if the corruption keeps column data readable we do not detect it, and if it 
 corrupts to a higher timestamp value can even resist being overwritten by 
 newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2506) Push read repair setting down to the DC-level

2011-08-01 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073634#comment-13073634
 ] 

Ryan King commented on CASSANDRA-2506:
--

It would also be nice if you could specify a different repair rate for intra-DC 
and inter-DC repairs.

 Push read repair setting down to the DC-level
 -

 Key: CASSANDRA-2506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2506
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Brandon Williams
Assignee: Patricio Echague
 Fix For: 1.0


 Currently, read repair is a global setting.  However, when you have two DCs 
 and use one for analytics, it would be nice to turn it off only for that DC 
 so the live DC serving the application can still benefit from it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2498) Improve read performance in update-intensive workload

2011-08-01 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073633#comment-13073633
 ] 

Ryan King commented on CASSANDRA-2498:
--

Not sure what we can do about that unless we make counters idempotent, which 
may not feasible.

 Improve read performance in update-intensive workload
 -

 Key: CASSANDRA-2498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2498
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: ponies
 Fix For: 1.0

 Attachments: 2498-v2.txt, supersede-name-filter-collations.patch


 Read performance in an update-heavy environment relies heavily on compaction 
 to maintain good throughput. (This is not the case for workloads where rows 
 are only inserted once, because the bloom filter keeps us from having to 
 check sstables unnecessarily.)
 Very early versions of Cassandra attempted to mitigate this by checking 
 sstables in descending generation order (mostly equivalent to descending 
 mtime): once all the requested columns were found, it would not check any 
 older sstables.
 This was incorrect, because data timestamp will not correspond to sstable 
 timestamp, both because compaction has the side effect of refreshing data 
 to a newer sstable, and because hintead handoff may send us data older than 
 what we already have.
 Instead, we could create a per-sstable piece of metadata containing the most 
 recent (client-specified) timestamp for any column in the sstable.  We could 
 then sort sstables by this timestamp instead, and perform a similar 
 optimization (if the remaining sstable client-timestamps are older than the 
 oldest column found in the desired result set so far, we don't need to look 
 further). Since under almost every workload, client timestamps of data in a 
 given sstable will tend to be similar, we expect this to cut the number of 
 sstables down proportionally to how frequently each column in the row is 
 updated. (If each column is updated with each write, we only have to check a 
 single sstable.)
 This may also be useful information when deciding which SSTables to compact.
 (Note that this optimization is only appropriate for named-column queries, 
 not slice queries, since we don't know what non-overlapping columns may exist 
 in older sstables.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1379) Uncached row reads may block cached reads

2011-08-01 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073653#comment-13073653
 ] 

Ryan King commented on CASSANDRA-1379:
--

We have use cases along these lines too. We've had to resort to bumping the 
read threads up much higher (128 or 256) for highly cached workloads.

 Uncached row reads may block cached reads
 -

 Key: CASSANDRA-1379
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1379
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: David King
Assignee: Javier Canillas
Priority: Minor
 Attachments: CASSANDRA-1379.patch


 The cap on the number of concurrent reads appears to cap the *total* number 
 of concurrent reads instead of just capping the reads that are bound for 
 disk. That is, given N concurrent readers if all of them are busy waiting on 
 disk, even reads that can be served from the row cache will block waiting for 
 them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-07-27 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071857#comment-13071857
 ] 

Ryan King commented on CASSANDRA-1608:
--

bq. bq. Is it even worth keeping bloom filters around with such a drastic 
reduction in worst-case number of sstables to check (for read path too)?

bq. I think they are absolutely worth keeping around for unleveled sstables, 
but for leveled sstables the value is certainly questionable. Perhaps having 
some kind of LRU cache where we have an upper bound on the number of bloom 
filters we keep in memory would be wise. Is it possible that we could move 
these off-heap?

I admit that I probably don't fully understand this change, but we have at 
least one workload where keeping BFs would probably be necessary– the vast 
majority of the traffic on that workload is for keys that don't exist anywhere. 
Even small bumps in BF false positive rates greatly effect the read performance.

 Redesigned Compaction
 -

 Key: CASSANDRA-1608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Benjamin Coverston
 Attachments: 1608-v2.txt, 1608-v8.txt, 1609-v10.txt


 After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
 thinking on this subject that I wanted to lay out.
 I propose we redo the concept of how compaction works in Cassandra. At the 
 moment, compaction is kicked off based on a write access pattern, not read 
 access pattern. In most cases, you want the opposite. You want to be able to 
 track how well each SSTable is performing in the system. If we were to keep 
 statistics in-memory of each SSTable, prioritize them based on most accessed, 
 and bloom filter hit/miss ratios, we could intelligently group sstables that 
 are being read most often and schedule them for compaction. We could also 
 schedule lower priority maintenance on SSTable's not often accessed.
 I also propose we limit the size of each SSTable to a fix sized, that gives 
 us the ability to  better utilize our bloom filters in a predictable manner. 
 At the moment after a certain size, the bloom filters become less reliable. 
 This would also allow us to group data most accessed. Currently the size of 
 an SSTable can grow to a point where large portions of the data might not 
 actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2897) Secondary indexes without read-before-write

2011-07-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064991#comment-13064991
 ] 

Ryan King commented on CASSANDRA-2897:
--

Can't we deal with the races by properly using timestamps?

 Secondary indexes without read-before-write
 ---

 Key: CASSANDRA-2897
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2897
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: secondary_index

 Currently, secondary index updates require a read-before-write to maintain 
 the index consistency. Keeping the index consistent at all time is not 
 necessary however. We could let the (secondary) index get inconsistent on 
 writes and repair those on reads. This would be easy because on reads, we 
 make sure to request the indexed columns anyway, so we can just skip the row 
 that are not needed and repair the index at the same time.
 This does trade work on writes for work on reads. However, read-before-write 
 is sufficiently costly that it will likely be a win overall.
 There is (at least) two small technical difficulties here though:
 # If we repair on read, this will be racy with writes, so we'll probably have 
 to synchronize there.
 # We probably shouldn't only rely on read to repair and we should also have a 
 task to repair the index for things that are rarely read. It's unclear how to 
 make that low impact though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops

2011-07-05 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060119#comment-13060119
 ] 

Ryan King commented on CASSANDRA-2819:
--

Opened https://issues.apache.org/jira/browse/CASSANDRA-2819 for the followup.

 Split rpc timeout for read and write ops
 

 Key: CASSANDRA-2819
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
Assignee: Melvin Wang
 Fix For: 1.0

 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff


 Given the vastly different latency characteristics of reads and writes, it 
 makes sense for them to have independent rpc timeouts internally.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2858) make request dropping more accurate

2011-07-05 Thread Ryan King (JIRA)
make request dropping more accurate
---

 Key: CASSANDRA-2858
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2858
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Melvin Wang
Priority: Minor


Based on the discussion in 
https://issues.apache.org/jira/browse/CASSANDRA-2819, we can make the 
bookkeeping for request times more accurate.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2817) Expose number of threads blocked on submitting a memtable for flush

2011-06-23 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053946#comment-13053946
 ] 

Ryan King commented on CASSANDRA-2817:
--

+1

 Expose number of threads blocked on submitting a memtable for flush
 ---

 Key: CASSANDRA-2817
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2817
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.7

 Attachments: 
 0001-Expose-threads-blocked-on-submission-to-executor.patch


 Writes can be blocked by a thread trying to submit a memtable while the flush 
 queue is full. While this is the expected behavior (the goal being to prevent 
 OOMing), it is worth exposing when that happens so that people can monitor it 
 and modify settings accordingly if that happens too often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-06-23 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986185#comment-12986185
 ] 

Ryan King edited comment on CASSANDRA-2045 at 6/23/11 5:18 PM:
---

I think the two approaches are suitable for different kinds of data models. The 
pointer approach is almost certainly better for narrow rows, while worse for 
large, dynamic rows.

  was (Author: kingryan):
I think the two approaches are suitable for different kinds of data models. 
The point approach is almost certainly better for narrow rows, while worse for 
large, dynamic rows.
  
 Simplify HH to decrease read load when nodes come back
 --

 Key: CASSANDRA-2045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Nicholas Telford
 Fix For: 1.0

 Attachments: 
 0001-Changed-storage-of-Hints-to-store-a-serialized-RowMu.patch, 
 0002-Refactored-HintedHandoffManager.sendRow-to-reduce-co.patch, 
 0003-Fixed-some-coding-style-issues.patch, 
 0004-Fixed-direct-usage-of-Gossiper.getEndpointStateForEn.patch, 
 0005-Removed-duplicate-failure-detection-conditionals.-It.patch, 
 0006-Removed-handling-of-old-style-hints.patch, 
 CASSANDRA-2045-simplify-hinted-handoff-001.diff, 
 CASSANDRA-2045-simplify-hinted-handoff-002.diff


 Currently when HH is enabled, hints are stored, and when a node comes back, 
 we begin sending that node data. We do a lookup on the local node for the row 
 to send. To help reduce read load (if a node is offline for long period of 
 time) we should store the data we want forward the node locally instead. We 
 wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2804) expose dropped messages, exceptions over JMX

2011-06-23 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2804:
-

Attachment: twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff

Funny- we have a patch we've been working on for similar things.

Attached patch only does dropped messages, but it also includes a recent 
variant in JMX, which we need for our monitoring.

 expose dropped messages, exceptions over JMX
 

 Key: CASSANDRA-2804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2804
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.7, 0.8.2

 Attachments: 2804.txt, 
 twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff


 Patch against 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-06-21 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052734#comment-13052734
 ] 

Ryan King commented on CASSANDRA-47:


I think this is going to be obsoleted by CASSANDRA-674.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
  Labels: compression
 Fix For: 1.0


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-06-21 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052765#comment-13052765
 ] 

Ryan King commented on CASSANDRA-1717:
--

I know I'm starting to sound like a broken record, but CASSANDRA-674 is going 
to include checksums. And its almost ready for reviewing.

 Cassandra cannot detect corrupt-but-readable column data
 

 Key: CASSANDRA-1717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: checksums.txt


 Most corruptions of on-disk data due to bitrot render the column (or row) 
 unreadable, so the data can be replaced by read repair or anti-entropy.  But 
 if the corruption keeps column data readable we do not detect it, and if it 
 corrupts to a higher timestamp value can even resist being overwritten by 
 newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2781) regression: exposing cache size through MBean

2011-06-16 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050532#comment-13050532
 ] 

Ryan King commented on CASSANDRA-2781:
--

It would be nice if we had some tests around these things, but I'm +1 on this 
patch.

 regression: exposing cache size through MBean
 -

 Key: CASSANDRA-2781
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2781
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Chris Burroughs
Assignee: Chris Burroughs
Priority: Minor
 Attachments: 2781-v1.txt


 Looks like it was part of CASSANDRA-1969.  A method called size, as opposed 
 to getSize, won't be exposed through jmx.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable

2011-06-16 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050597#comment-13050597
 ] 

Ryan King commented on CASSANDRA-2521:
--

+1 for less hacking around the GC

 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0

 Attachments: 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2751) Improved Metrics collection

2011-06-09 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2751:
-

Description: 
Collecting metrics in cassandra needs to be easier. Currently the amount of 
work required to expose one new metric in the server and consume it outside the 
server is way to high.

In my mind, collecting a new metric in the server should be a single line of 
code and consuming it should be easily doable from any programming language.

There are several options for better metrics collection on the JVM:

https://github.com/twitter/ostrich
https://github.com/codahale/metrics/
https://github.com/twitter/commons/tree/master/src/java/com/twitter/common/stats

We should look at these

  was:
Collecting metrics in cassandra needs to be easier. Currently the amount of 
work required to expose one new metric in the server and consume it outside the 
server is way to high.

In my mind, collecting a new metric in the server should be a single line of 
code and consuming it should be easily doable from any programming language.

There are several options for better metrics collection on the JVM:

https://github.com/twitter/ostrich
https://github.com/codahale/metrics/

We should look at these


 Improved Metrics collection
 ---

 Key: CASSANDRA-2751
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2751
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King

 Collecting metrics in cassandra needs to be easier. Currently the amount of 
 work required to expose one new metric in the server and consume it outside 
 the server is way to high.
 In my mind, collecting a new metric in the server should be a single line of 
 code and consuming it should be easily doable from any programming language.
 There are several options for better metrics collection on the JVM:
 https://github.com/twitter/ostrich
 https://github.com/codahale/metrics/
 https://github.com/twitter/commons/tree/master/src/java/com/twitter/common/stats
 We should look at these

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-06-08 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046069#comment-13046069
 ] 

Ryan King commented on CASSANDRA-2749:
--

Since each keyspace is stored in a different sub-directory of the 
DataDiretories, you can already split the storage of different keyspaces with 
some clever mount options. Maybe we could give column families the same 
treatment?

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor

 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2751) Improved Metrics collection

2011-06-08 Thread Ryan King (JIRA)
Improved Metrics collection
---

 Key: CASSANDRA-2751
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2751
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King


Collecting metrics in cassandra needs to be easier. Currently the amount of 
work required to expose one new metric in the server and consume it outside the 
server is way to high.

In my mind, collecting a new metric in the server should be a single line of 
code and consuming it should be easily doable from any programming language.

There are several options for better metrics collection on the JVM:

https://github.com/twitter/ostrich
https://github.com/codahale/metrics/

We should look at these

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2686) Distributed per row locks

2011-05-23 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038070#comment-13038070
 ] 

Ryan King commented on CASSANDRA-2686:
--

You'll likely end up reimplementing something like Paxos (what google's chubby 
uses) or ZAB (what Zookeeper uses).

 Distributed per row locks
 -

 Key: CASSANDRA-2686
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2686
 Project: Cassandra
  Issue Type: Wish
  Components: Core
 Environment: any
Reporter: LuĂ­s Ferreira
  Labels: api-addition, features

 Instead of using a centralized locking strategy like cages with zookeeper, I 
 would like to have it in a decentralized way. Even if it carries some 
 limitations. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2686) Distributed per row locks

2011-05-23 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038078#comment-13038078
 ] 

Ryan King commented on CASSANDRA-2686:
--

Those protocols are methods for reach[ing] agreement. You're basically 
describing how ZK works.

 Distributed per row locks
 -

 Key: CASSANDRA-2686
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2686
 Project: Cassandra
  Issue Type: Wish
  Components: Core
 Environment: any
Reporter: LuĂ­s Ferreira
  Labels: api-addition, features

 Instead of using a centralized locking strategy like cages with zookeeper, I 
 would like to have it in a decentralized way. Even if it carries some 
 limitations. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-17 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034878#comment-13034878
 ] 

Ryan King commented on CASSANDRA-1610:
--

Agreed.

 Pluggable Compaction
 

 Key: CASSANDRA-1610
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Alan Liang
Priority: Minor
  Labels: compaction
 Fix For: 1.0

 Attachments: 0001-move-compaction-code-into-own-package.patch, 
 0002-Pluggable-Compaction-and-Expiration.patch


 In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
 it also makes sense to allow the ability to have pluggable compaction per CF. 
 There could be many types of workloads where this makes sense. One example we 
 had at Digg was to completely throw away certain SSTables after N days.
 The goal of this ticket is to make compaction pluggable enough to support 
 compaction based on max timestamp ordering of the sstables while satisfying 
 max sstable size, min and max compaction thresholds. Another goal is to allow 
 expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-47) SSTable compression

2011-05-16 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034143#comment-13034143
 ] 

Ryan King commented on CASSANDRA-47:


Stu is working on https://issues.apache.org/jira/browse/CASSANDRA-674 which 
will improve the file size dramatically.

 SSTable compression
 ---

 Key: CASSANDRA-47
 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
  Labels: compression
 Fix For: 1.0


 We should be able to do SSTable compression which would trade CPU for I/O 
 (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2657) Allow configuration of multiple types of the Thrift server

2011-05-16 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2657:
-

Summary: Allow configuration of multiple types of the Thrift server  (was: 
Allow configuration of multiple types of the Trift server)

 Allow configuration of multiple types of the Thrift server
 --

 Key: CASSANDRA-2657
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2657
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.1
 Environment: JVM 1.6
Reporter: Vijay
Assignee: Vijay
 Fix For: 0.8.0, 0.8.1


 Thrift server has multiple modes of operations specifically...
  
 1) TNonblockingServer
 2) THsHaServer
 3) TThreadPoolServer
 We should provide a configuration to enable all of the above. The client 
 library can either use Async or the Sync... (independent of the server side)
 This patch also might address the issue (which we where seeing), when there 
 are large number of connections to the server (throughput reduces).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2597) inconsistent implementation of 'cumulative distribution function' for Exponential Distribution

2011-05-16 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2597:
-

  Component/s: (was: Core)
   Contrib
  Description: 
As reported on the mailing list 
(http://mail-archives.apache.org/mod_mbox/cassandra-dev/201104.mbox/%3CAANLkTimdMSLE8-z0x+0kvzqp7za3AEMLaOFXvd4Z=t...@mail.gmail.com%3E),

{quote}
I just found there are two implementations of 'cumulative distribution
function' for Exponential Distribution and there are inconsistent :

*FailureDetector*
{code:java}
org.apache.cassandra.gms.ArrivalWindow.p(double)
   double p(double t)
   {
   double mean = mean();
   double exponent = (-1)*(t)/mean;
   return *Math.pow(Math.E, exponent)*;
   }
{code}

*DynamicEndpointSnitch*
{code:java}
org.apache.cassandra.locator.AdaptiveLatencyTracker.p(double)
   double p(double t)
   {
   double mean = mean();
   double exponent = (-1) * (t) / mean;
   return *1 - Math.pow( Math.E, exponent);*
   }
{code}

According to the  Exponential Distribution cumulative distribution function
definitionhttp://en.wikipedia.org/wiki/Exponential_distribution#Cumulative_distribution_function,
the later one is correct
{quote}

... however FailureDetector has been working as advertised for some time now.  
Does this mean the Snitch version is actually wrong?

  was:
As reported on the mailing list 
(http://mail-archives.apache.org/mod_mbox/cassandra-dev/201104.mbox/%3CAANLkTimdMSLE8-z0x+0kvzqp7za3AEMLaOFXvd4Z=t...@mail.gmail.com%3E),

{quote}
I just found there are two implementations of 'cumulative distribution
function' for Exponential Distribution and there are inconsistent :

*FailureDetector*
org.apache.cassandra.gms.ArrivalWindow.p(double)
   double p(double t)
   {
   double mean = mean();
   double exponent = (-1)*(t)/mean;
   return *Math.pow(Math.E, exponent)*;
   }

*DynamicEndpointSnitch*
org.apache.cassandra.locator.AdaptiveLatencyTracker.p(double)
   double p(double t)
   {
   double mean = mean();
   double exponent = (-1) * (t) / mean;
   return *1 - Math.pow( Math.E, exponent);*
   }

According to the  Exponential Distribution cumulative distribution function
definitionhttp://en.wikipedia.org/wiki/Exponential_distribution#Cumulative_distribution_function,
the later one is correct
{quote}

... however FailureDetector has been working as advertised for some time now.  
Does this mean the Snitch version is actually wrong?

Fix Version/s: (was: 0.7.7)
   0.7.6

 inconsistent implementation of 'cumulative distribution function' for 
 Exponential Distribution
 --

 Key: CASSANDRA-2597
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2597
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib
Reporter: Jonathan Ellis
Assignee: paul cannon
Priority: Minor
 Fix For: 0.7.6


 As reported on the mailing list 
 (http://mail-archives.apache.org/mod_mbox/cassandra-dev/201104.mbox/%3CAANLkTimdMSLE8-z0x+0kvzqp7za3AEMLaOFXvd4Z=t...@mail.gmail.com%3E),
 {quote}
 I just found there are two implementations of 'cumulative distribution
 function' for Exponential Distribution and there are inconsistent :
 *FailureDetector*
 {code:java}
 org.apache.cassandra.gms.ArrivalWindow.p(double)
double p(double t)
{
double mean = mean();
double exponent = (-1)*(t)/mean;
return *Math.pow(Math.E, exponent)*;
}
 {code}
 *DynamicEndpointSnitch*
 {code:java}
 org.apache.cassandra.locator.AdaptiveLatencyTracker.p(double)
double p(double t)
{
double mean = mean();
double exponent = (-1) * (t) / mean;
return *1 - Math.pow( Math.E, exponent);*
}
 {code}
 According to the  Exponential Distribution cumulative distribution function
 definitionhttp://en.wikipedia.org/wiki/Exponential_distribution#Cumulative_distribution_function,
 the later one is correct
 {quote}
 ... however FailureDetector has been working as advertised for some time now. 
  Does this mean the Snitch version is actually wrong?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2003) get_range_slices test

2011-05-16 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King reassigned CASSANDRA-2003:


Assignee: Stu Hood  (was: Kelvin Kakugawa)

 get_range_slices test
 -

 Key: CASSANDRA-2003
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2003
 Project: Cassandra
  Issue Type: Test
  Components: Core
 Environment: RandomPartitioner
Reporter: Kelvin Kakugawa
Assignee: Stu Hood
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0002-Assert-that-we-don-t-double-count-any-keys.txt, 
 CASSANDRA-2003-0.7-0001.patch, CASSANDRA-2003-0001.patch


 Test get_range_slices (on an RP cluster) to walk:
 * all keys on each node
 * all keys across cluster

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-05-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033294#comment-13033294
 ] 

Ryan King commented on CASSANDRA-1608:
--

I only read the LevelDB stuff briefly. I think there's a lot we can learn, but 
there's at least 2 challenges:

1) client supplied timestamps mean that you can't know that newer files 
supercede older ones
2) the CF data model means that data for a given key in multiple sstables may 
need to be merged

 Redesigned Compaction
 -

 Key: CASSANDRA-1608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet

 After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
 thinking on this subject that I wanted to lay out.
 I propose we redo the concept of how compaction works in Cassandra. At the 
 moment, compaction is kicked off based on a write access pattern, not read 
 access pattern. In most cases, you want the opposite. You want to be able to 
 track how well each SSTable is performing in the system. If we were to keep 
 statistics in-memory of each SSTable, prioritize them based on most accessed, 
 and bloom filter hit/miss ratios, we could intelligently group sstables that 
 are being read most often and schedule them for compaction. We could also 
 schedule lower priority maintenance on SSTable's not often accessed.
 I also propose we limit the size of each SSTable to a fix sized, that gives 
 us the ability to  better utilize our bloom filters in a predictable manner. 
 At the moment after a certain size, the bloom filters become less reliable. 
 This would also allow us to group data most accessed. Currently the size of 
 an SSTable can grow to a point where large portions of the data might not 
 actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-05-10 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031333#comment-13031333
 ] 

Ryan King commented on CASSANDRA-1608:
--

Its important to remember that LevelDB is key/value, not a column family data 
model, so there are concerns and constraints that apply to cassandra which do 
not apply to LevelDB.

 Redesigned Compaction
 -

 Key: CASSANDRA-1608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet

 After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
 thinking on this subject that I wanted to lay out.
 I propose we redo the concept of how compaction works in Cassandra. At the 
 moment, compaction is kicked off based on a write access pattern, not read 
 access pattern. In most cases, you want the opposite. You want to be able to 
 track how well each SSTable is performing in the system. If we were to keep 
 statistics in-memory of each SSTable, prioritize them based on most accessed, 
 and bloom filter hit/miss ratios, we could intelligently group sstables that 
 are being read most often and schedule them for compaction. We could also 
 schedule lower priority maintenance on SSTable's not often accessed.
 I also propose we limit the size of each SSTable to a fix sized, that gives 
 us the ability to  better utilize our bloom filters in a predictable manner. 
 At the moment after a certain size, the bloom filters become less reliable. 
 This would also allow us to group data most accessed. Currently the size of 
 an SSTable can grow to a point where large portions of the data might not 
 actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2614) create Column and CounterColumn in the same column family

2011-05-09 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030799#comment-13030799
 ] 

Ryan King commented on CASSANDRA-2614:
--

Ah, that makes more sense.

 create Column and CounterColumn in the same column family
 -

 Key: CASSANDRA-2614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2614
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Dave Rav
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1


 create Column and CounterColumn in the same column family

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2614) create Column and CounterColumn in the same column family

2011-05-06 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030160#comment-13030160
 ] 

Ryan King commented on CASSANDRA-2614:
--

I don't think this is feasible to do robustly. Several problems

1. If a column is initially created as a counter, but a non-counter insert 
comes through what do we do? We can't give the inserter an error unless we 
introduce reads in the write path.
2. The write path is somewhat different for the two kinds of columns. Counters 
don't really respect CLs the same way normal columns do.

 create Column and CounterColumn in the same column family
 -

 Key: CASSANDRA-2614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2614
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Dave Rav
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1


 create Column and CounterColumn in the same column family

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2540) Data reads by default

2011-04-29 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027140#comment-13027140
 ] 

Ryan King commented on CASSANDRA-2540:
--

Sylvain-

It don't think its the average performance that matters here, but the worse 
case. For our deployments we have latency targets at the 99th percentile. Some 
of those are quite low ( 10ms), so even a small number of requests that have 
to wait for the rpc timeout make our goals difficult, even if we lower the rpc 
timeout.

 Data reads by default
 -

 Key: CASSANDRA-2540
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2540
 Project: Cassandra
  Issue Type: Wish
Reporter: Stu Hood
Priority: Minor

 The intention of digest vs data reads is to save bandwidth in the read path 
 at the cost of latency, but I expect that this has been a premature 
 optimization.
 * Data requested by a read will often be within an order of magnitude of the 
 digest size, and a failed digest means extra roundtrips, more bandwidth
 * The [digest reads but not your data 
 read|https://issues.apache.org/jira/browse/CASSANDRA-2282?focusedCommentId=13004656page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13004656]
  problem means failing QUORUM reads because a single node is unavailable, and 
 would require eagerly re-requesting at some fraction of your timeout
 * Saving bandwidth in cross datacenter usecases comes at huge cost to 
 latency, but since both constraints change proportionally (enough), the 
 tradeoff is not clear
 Some options:
 # Add an option to use digest reads
 # Remove digest reads entirely (and/or punt and make them a runtime 
 optimization based on data size in the future)
 # Continue to use digest reads, but send them to {{N - R}} nodes for 
 (somewhat) more predicatable behavior with QUORUM
 \\
 The outcome of data-reads-by-default should be significantly improved 
 latency, with a moderate increase in bandwidth usage for large reads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2558) Add concurrent_compactions configuration

2011-04-28 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026392#comment-13026392
 ] 

Ryan King commented on CASSANDRA-2558:
--

I believe Terje had compaction turned off during a bulk import. Those 
compactions happened when compaction was reactivated.

 Add concurrent_compactions configuration
 --

 Key: CASSANDRA-2558
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2558
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8 beta 1
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 0.8.1

 Attachments: 0001-Make-compaction-thread-number-configurable.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 We should expose a way to configure the max number of thread to use when 
 multi_threaded compaction is turned on. So far, it uses nb_of_processors 
 thread, which if you have many cores may be unreasonably high (as far as 
 random IO is concerned and thus independently of compaction throttling)... at 
 least unless you have SSD.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2498) Improve read performance in update-intensive workload

2011-04-18 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021247#comment-13021247
 ] 

Ryan King commented on CASSANDRA-2498:
--

In addition to update-heavy operations. Column Families with wide rows need 
some love on the latency side too.

 Improve read performance in update-intensive workload
 -

 Key: CASSANDRA-2498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2498
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
  Labels: ponies
 Fix For: 1.0


 Read performance in an update-heavy environment relies heavily on compaction 
 to maintain good throughput. (This is not the case for workloads where rows 
 are only inserted once, because the bloom filter keeps us from having to 
 check sstables unnecessarily.)
 Very early versions of Cassandra attempted to mitigate this by checking 
 sstables in descending generation order (mostly equivalent to descending 
 mtime): once all the requested columns were found, it would not check any 
 older sstables.
 This was incorrect, because data timestamp will not correspond to sstable 
 timestamp, both because compaction has the side effect of refreshing data 
 to a newer sstable, and because hintead handoff may send us data older than 
 what we already have.
 Instead, we could create a per-sstable piece of metadata containing the most 
 recent (client-specified) timestamp for any column in the sstable.  We could 
 then sort sstables by this timestamp instead, and perform a similar 
 optimization (if the remaining sstable client-timestamps are older than the 
 oldest column found in the desired result set so far, we don't need to look 
 further). Since under almost every workload, client timestamps of data in a 
 given sstable will tend to be similar, we expect this to cut the number of 
 sstables down proportionally to how frequently each column in the row is 
 updated. (If each column is updated with each write, we only have to check a 
 single sstable.)
 This may also be useful information when deciding which SSTables to compact.
 (Note that this optimization is only appropriate for named-column queries, 
 not slice queries, since we don't know what non-overlapping columns may exist 
 in older sstables.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2502) disable cache saving on system CFs

2011-04-18 Thread Ryan King (JIRA)
disable cache saving on system CFs
--

 Key: CASSANDRA-2502
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2502
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Attachments: 0001-disable-cache-saving-on-system-tables.patch



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2502) disable cache saving on system CFs

2011-04-18 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2502:
-

Attachment: 0001-disable-cache-saving-on-system-tables.patch

 disable cache saving on system CFs
 --

 Key: CASSANDRA-2502
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2502
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Attachments: 0001-disable-cache-saving-on-system-tables.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1329) make multiget take a set of keys instead of a list

2011-04-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019440#comment-13019440
 ] 

Ryan King commented on CASSANDRA-1329:
--

I don't know if this change is worth the breakage is causes (I assume that all 
clients will have to be updated).

 make multiget take a set of keys instead of a list
 --

 Key: CASSANDRA-1329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1329
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Attachments: 1329-rebase.txt, 1329-stresspy-multiget.txt, 1329.txt, 
 multiget.test, multigetsmall.test


 this more correctly sets the expectation that the order of keys in that list 
 doesn't matter, and duplicates don't make sense

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns

2011-04-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019457#comment-13019457
 ] 

Ryan King commented on CASSANDRA-2466:
--

Moving to smaller arrays would make the allocation easier, but wouldn't reduce 
the raw amount of memory needed for a large bloom filter.

Would it be worth moving these off-heap completely?

 bloom filters should avoid huge array allocations to avoid fragmentation 
 concerns
 -

 Key: CASSANDRA-2466
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2466
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Priority: Minor

 The fact that bloom filters are backed by single large arrays of longs is 
 expected to interact badly with promotion of objects into old gen with CMS, 
 due to fragmentation concerns (as discussed in CASSANDRA-2463).
 It should be less of an issue than CASSANDRA-2463 in the sense that you need 
 to have a lot of rows before the array sizes become truly huge. For 
 comparison, the ~ 143 million row key limit implied by the use of 'int' in 
 BitSet prior to the switch to OpenBitSet translates roughly to 238 MB 
 (assuming the limitation factor there was the addressability of the bits with 
 a 32 bit int, which is my understanding).
 Having a preliminary look at OpenBitSet with an eye towards replacing the 
 single long[] with multiple arrays, it seems that if we're willing to drop 
 some of the functionality that is not used for bloom filter purposes, the 
 bits[i] indexing should be pretty easy to augment with modulo to address an 
 appropriate smaller array. Locality is not an issue since the bloom filter 
 case is the worst possible case for locality anyway, and it doesn't matter 
 whether it's one huge array or a number of ~ 64k arrays.
 Callers may be affected like BloomFilterSerializer which cares about the 
 underlying bit array.
 If the full functionality of OpenBitSet is to be maintained (e.g., xorCount) 
 some additional acrobatics would be necessary and presumably at a noticable 
 performance cost if such operations were to be used in performance critical 
 places.
 An argument against touching OpenBitSet is that it seems to be pretty 
 carefully written and tested and has some non-trivial details and people have 
 seemingly benchmarked it quite carefully. On the other hand, the improvement 
 would then apply to other things as well, such as the bitsets used to keep 
 track of in-core pages (off the cuff for scale, a 64 gig sstable should imply 
 a 2 mb bit set, with one bit per 4k page).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2428) Running cleanup on a node with join_ring=false removes all data

2011-04-07 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017137#comment-13017137
 ] 

Ryan King commented on CASSANDRA-2428:
--

Sylvain-

That seems like the right plan.

 Running cleanup on a node with join_ring=false removes all data
 ---

 Key: CASSANDRA-2428
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2428
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.1
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
Priority: Critical
 Fix For: 0.7.5

 Attachments: 
 0001-Don-t-allow-cleanup-when-node-hasn-t-join-the-ring.patch


 If you need to bring up a node with join_ring=false for operator maintenance, 
 and this node already has data, it will end up removing the data on the node. 
 We noticed this when we were calling cleanup on a specific CF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-1952) Support TTLs on counter columns

2011-04-06 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King resolved CASSANDRA-1952.
--

Resolution: Duplicate

dupe of CASSANDRA-2103

 Support TTLs on counter columns
 ---

 Key: CASSANDRA-1952
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1952
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Stu Hood
Priority: Minor

 We would like to support TTLs for counter columns, with the behaviour that 
 the count is unset when the TTL expires, and that every mutation to the 
 counter updates the TTL deadline.
 This would allow for interesting rate-limiting usecases, automatic cleanup of 
 time-series data, and API consistency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2103) expiring counter columns

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King reassigned CASSANDRA-2103:


Assignee: Ryan King  (was: Kelvin Kakugawa)

 expiring counter columns
 

 Key: CASSANDRA-2103
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2103
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.8
Reporter: Kelvin Kakugawa
Assignee: Ryan King
 Fix For: 0.8

 Attachments: 0001-CASSANDRA-2103-expiring-counters-logic-tests.patch


 add ttl functionality to counter columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1418) Automatic, online load balancing

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1418:
-

Fix Version/s: (was: 0.8)
   1.0

 Automatic, online load balancing
 

 Key: CASSANDRA-1418
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1418
 Project: Cassandra
  Issue Type: Improvement
Reporter: Stu Hood
 Fix For: 1.0


 h2. Goal
 CASSANDRA-192 began with the intention of implementing full cluster load 
 balancing, but ended up being (wisely) limited to a manual load balancing 
 operation. This issue is an umbrella ticket for finishing the job of 
 implementing automatic, always-on load balancing.
 It is possible to implement very efficient load balancing operations with a 
 single process directing the rebalancing of all nodes, but avoiding such a 
 central process and allowing individual nodes to make their own movement 
 decisions would be ideal.
 h2. Components
 h3. Optimal movements for individual nodes
 h4. Ruhl
 One such approach is the Ruhl algorithm described on 192: 
 https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079 . But as 
 described, it performs excessive movement for large hotspots, and can take a 
 long time to reach equilibrium. Consider the following ring:
 ||token||load||
 |a|5|
 |c|5|
 |e|5|
 |f|40|
 |k|5|
 Assuming that node 'a' is the first to discover that 'f' is overloaded: it 
 will apply Case 2, and assume half of 'f's load by moving to 'i', leaving 
 both with 20 units. But this is not a optimal movement, because both 'f' and 
 'a/i' will still be holding data that they will need to give away. 
 Additionally, 'a/i' can't begin giving the data away until it has finished 
 receiving it.
 If node 'e' is the first to discover that 'f' is overloaded, it will apply 
 Case 1, and 'f' will give half of its load to 'e' by moving to 'i'. Again, 
 this is a non-optimal movement, because it will result in both 'e' and 'f/i' 
 holding data that they need to give away.
 h4. Adding load awareness to Ruhl
 Luckily, there appears to be a simple adjustment to the Ruhl algorithm that 
 solves this problem by taking advantage of the fact that Cassandra knows the 
 total load of a cluster, and can use it to calculate the average/ideal load 
 ω. Once node j has decided it should take load from node i (based on the ε 
 value in Ruhl), rather than node j taking 1/2 of the load on node i, it 
 should chose a token such that either i or j ends up with a load within ε*ω 
 of ω.
 Again considering the ring described above, and assuming ε == 1.0, the total 
 load for the 5 nodes is 60, giving a ω of 12. If node 'a' is the first to 
 discover 'f', it will choose to move to 'j' (a token that takes 12 or ω load 
 units from 'f'), leaving 'f' with a load of 28. When combined with the 
 improvement in the next section, this is closer to being an optimal movement, 
 because 'a/j' will at worst have ε*ω of load to give away, and 'f' is in a 
 position to start more movements.
 h3. Automatic load balancing
 Since the Ruhl algorithm only requires a node to make a decision based on 
 itself and one other node, it should be relatively straightforward to add a 
 timer on each node that periodically wakes up and executes the modifiied Ruhl 
 algorithm if it is not already in the process of moving (based on pending 
 ranges).
 Automatic balancing should probably be enabled by default, and should have a 
 configurable per-node bandwidth cap.
 h3. Allowing concurrent moves on a node
 Allowing a node to give away multiple ranges at once allows for the type of 
 quick balancing that is typically only attributed to vnodes. If a node is a 
 hotspot, such as in the example above, the node should be able to quickly 
 dump the load in a manner that causes minimal load on the rest of the 
 cluster. Rather than transferring to 1 target at 10 MB/s, a hotspot can give 
 to 5 targets at 2 MB/s each.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2089) Distributed test for the dynamic snitch

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2089:
-

Fix Version/s: (was: 0.8)
   1.0

 Distributed test for the dynamic snitch
 ---

 Key: CASSANDRA-2089
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2089
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
  Labels: des
 Fix For: 1.0


 The dynamic snitch has turned into an essential component in dealing with 
 partially failed nodes: it would be great to have it fully tested before the 
 0.8 release.
 In order to implement a proper test of the snitch, it is necessary to be able 
 to flip a switch to place a node in a degraded state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1205) Unify Partitioners and AbstractTypes

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1205:
-

Fix Version/s: (was: 0.8)
   1.0

 Unify Partitioners and AbstractTypes
 

 Key: CASSANDRA-1205
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1205
 Project: Cassandra
  Issue Type: Improvement
Reporter: Stu Hood
Priority: Critical
 Fix For: 1.0


 There is no good reason for Partitioners to have different semantics than 
 AbstractTypes. Instead, we should probably have 2 partitioners: Random and 
 Ordered, where the Ordered partitioner requires an AbstractType to be 
 specified, defaulting to BytesType.
 One solution [suggested by 
 jbellis|https://issues.apache.org/jira/browse/CASSANDRA-767?focusedCommentId=12841565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12841565]
  is to have AbstractType generate a collation id (essentially, a Token) for a 
 set of bytes.
 Looking forward, we should probably consider laying the groundwork to add 
 native support for compound row keys here as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2109) Improve default window size for DES

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2109:
-

Fix Version/s: (was: 0.8)
   1.0

 Improve default window size for DES
 ---

 Key: CASSANDRA-2109
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
 Project: Cassandra
  Issue Type: Improvement
Reporter: Stu Hood
Priority: Minor
  Labels: des
 Fix For: 1.0


 The window size for DES is currently hardcoded at 100 requests. A larger 
 window means that it takes longer to react to a suddenly slow node, but that 
 you have a smoother transition for scores.
 An example of bad behaviour: with a window of size 100, we saw a case with a 
 failing node where if enough requests could be answered quickly out of cache 
 or bloomfilters, the window might be momentarily filled with 10 ms requests, 
 pushing out requests that had to go disk and took 10 seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2045:
-

Fix Version/s: (was: 0.8)
   1.0

 Simplify HH to decrease read load when nodes come back
 --

 Key: CASSANDRA-2045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
 Fix For: 1.0


 Currently when HH is enabled, hints are stored, and when a node comes back, 
 we begin sending that node data. We do a lookup on the local node for the row 
 to send. To help reduce read load (if a node is offline for long period of 
 time) we should store the data we want forward the node locally instead. We 
 wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-674) New SSTable Format

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-674:


Fix Version/s: (was: 0.8)
   1.0

 New SSTable Format
 --

 Key: CASSANDRA-674
 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
 Fix For: 1.0

 Attachments: 674-v1.diff, 674-v2.tgz, perf-674-v1.txt, 
 perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt


 Various tickets exist due to limitations in the SSTable file format, 
 including #16, #47 and #328. Attached is a proposed design/implementation of 
 a new file format for SSTables that addresses a few of these limitations.
 This v2 implementation is not ready for serious use: see comments for 
 remaining issues. It is roughly the format described here: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2319) Promote row index

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2319:
-

Fix Version/s: (was: 0.8)
   1.0

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
  Labels: index, timeseries
 Fix For: 1.0


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1827) Batching across stages

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1827:
-

Fix Version/s: (was: 0.8)
   1.0

 Batching across stages
 --

 Key: CASSANDRA-1827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1827
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
 Fix For: 1.0


 We might be able to get some improvement if we start batching tasks for every 
 stage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1601) Refactor index definitions

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1601:
-

Fix Version/s: (was: 0.8)
   1.0

 Refactor index definitions
 --

 Key: CASSANDRA-1601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Stu Hood
 Fix For: 1.0


 h3. Overview
 There are a few considerations for defining secondary indexes and row 
 validation that I don't think have been brought up yet. While the interface 
 is still malleable pre 0.7.0, we should attempt to make changes that allow 
 for forwards compatibility of index/validator schemas. This is an umbrella 
 ticket for suggesting/debating the changes: other tickets should be opened 
 for quick improvements that can be made before 0.7.0.
 
 h3. Index output types
 The output (queryable) data from an indexing operation is what actually goes 
 in the index. For a particular row, the output can be either _single-valued_, 
 _multi-valued_ or _compound_:
 * Single-valued
 ** Implemented in trunk (special case of multi-valued)
 * Multi-valued
 ** Multiple index values _of the same type_ can match a single row
 ** Row probably contains a list/set (perhaps in a supercolumn)
 * Compound
 ** Multiple base properties concatenated as one index entry 
 ** Different validators/comparators for each component
 ** (Given the simplicity of performing boolean operations on 1472 indexes, 
 compound local indexes are unlikely to ever be worthwhile, but compound 
 distributed indexes will be: see comments on CASSANDRA-1599)
 h3. Index input types
 The other end of indexing is selection of values from a row to be indexed. 
 Selection can correspond directly to our current {{db.filter.*}} 
 implementations, and may be best implemented by specifying the 
 validator/index using the same Thrift objects you would use for a similar 
 query:
 * Name selection
 ** Implemented in trunk, but should probably just be a special case of list 
 selection below
 ** Corresponds to db.filter.NamesQueryFilter of size 1
 * List selection
 ** Should specify a list of columns of which all values must be of the same 
 type, as defined by the Validator
 ** Corresponds to db.filter.NamesQueryFilter
 * Range (prefix?) selection
 ** Subsets of a row may be interesting for indexing
 ** Range corresponds to db.filter.SliceQueryFilter
 *** (A Prefix might actually be more useful for indexing, but is better 
 implemented by indexing an arbitrarily nested row)
 ** Open question: might the ability to index only the 'top N values' from a 
 row be useful? If so, then this selector should allow N to be specified like 
 it would be for a slice
 h3. Supercolumns/arbitrary-nesting
 Another consideration is that we should be able to support indexing and 
 validation of supercolumns (and hence, arbitrarily nested rows). Since the 
 selection of columns to index is essentially the same as the selection of 
 columns to return for a query, this can probably mirror (and suggest 
 improvements to) our query API.
 h3. UDFs
 This is obviously still an open area, but user defined indexing functions are 
 essentially a transform between the _input_ and _output_ (as defined above), 
 which would normally have equal structures. Leaving room for UDFs in our 
 index definitions makes sense, and will likely lead to a much more general 
 and elegant design.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-808) Need a way to skip corrupted data in SSTables

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-808:


Fix Version/s: (was: 0.8)
   1.0

 Need a way to skip corrupted data in SSTables
 -

 Key: CASSANDRA-808
 URL: https://issues.apache.org/jira/browse/CASSANDRA-808
 Project: Cassandra
  Issue Type: Improvement
Reporter: Stu Hood
Priority: Minor
 Fix For: 1.0


 The new SSTable format will allow for checksumming of the data file, but as 
 it stands, we don't have a better way to handle the situation than throwing 
 an Exception indicating that the data is unreadable.
 We might want to add an option (triggerable via a command line flag?) to 
 Cassandra that will allow for skipping of corrupted keys/blocks in SSTables, 
 to pretend they don't exist rather than throwing the Exception.
 An administrator could temporarily enable the option and trigger a compaction 
 to perform a local repair of data, or they could leave it enabled constantly 
 for hands-off recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2364) Record dynamic snitch latencies for counter writes

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2364:
-

Fix Version/s: (was: 0.8)
   1.0

 Record dynamic snitch latencies for counter writes
 --

 Key: CASSANDRA-2364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
  Labels: counters
 Fix For: 1.0


 The counter code chooses a single replica to coordinate a write, meaning that 
 it should be subject to dynamic snitch latencies like a read would be. This 
 already works when there are reads going on, because the dynamic snitch read 
 latencies are used to pick a node to coordinate, but when there are no reads 
 going on (such as during a backfill) the latencies do not adjust.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-809) Full disk can result in being marked down

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-809:


Fix Version/s: (was: 0.8)
   1.0

 Full disk can result in being marked down
 -

 Key: CASSANDRA-809
 URL: https://issues.apache.org/jira/browse/CASSANDRA-809
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan King
Priority: Minor
 Fix For: 1.0


 We had a node file up the disk under one of two data directories. The result 
 was that the node stopped making progress. The problem appears to be this 
 (I'll update with more details as we find them):
 When new tasks are put onto most queues in Cassandra, if there isn't a thread 
 in the pool to handle the task immediately, the task in run in the caller's 
 thread
 (org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor:69 sets the 
 caller-runs policy).  The queue in question here is the queue that manages 
 flushes, which is enqueued to from various places in our code (and therefore 
 likely from multiple threads). Assuming that the full disk meant that no 
 threads doing flushing could make progress (it appears that way) eventually 
 any thread that calls the flush code would become stalled.
 Assuming our analysis is right (and we're still looking into it) we need to 
 make a change. Here's a proposal so far:
 SHORT TERM:
 * change the  TheadPoolExecutor policy to not be caller runs. This will let 
 other threads make progress in the event that one pool is stalled
 LONG TERM
 * It appears that there are n threads for n data directories that we flush 
 to, but they're not dedicated to a data directory. We should have a thread 
 per data directory and have that thread dedicated to that directory
 * Perhaps we could use the failure detector on disks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-04-05 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1610:
-

Fix Version/s: (was: 0.8)
   1.0

 Pluggable Compaction
 

 Key: CASSANDRA-1610
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Priority: Minor
 Fix For: 1.0


 In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
 it also makes sense to allow the ability to have pluggable compaction per CF. 
 There could be many types of workloads where this makes sense. One example we 
 had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

2011-03-31 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014298#comment-13014298
 ] 

Ryan King commented on CASSANDRA-2156:
--

This has been a big improvement for us in production. It'd be nice to get more 
eyes on it for 0.8.

 Compaction Throttling
 -

 Key: CASSANDRA-2156
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
 Project: Cassandra
  Issue Type: New Feature
Reporter: Stu Hood
 Fix For: 0.8

 Attachments: 
 0005-Throttle-total-compaction-to-a-configurable-throughput.txt, 
 for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, 
 for-0.6-0002-Make-compaction-throttling-configurable.txt


 Compaction is currently relatively bursty: we compact as fast as we can, and 
 then we wait for the next compaction to be possible (hurry up and wait).
 Instead, to properly amortize compaction, you'd like to compact exactly as 
 fast as you need to to keep the sstable count under control.
 For every new level of compaction, you need to increase the rate that you 
 compact at: a rule of thumb that we're testing on our clusters is to 
 determine the maximum number of buckets a node can support (aka, if the 15th 
 bucket holds 750 GB, we're not going to have more than 15 buckets), and then 
 multiply the flush throughput by the number of buckets to get a minimum 
 compaction throughput to maintain your sstable count.
 Full explanation: for a min compaction threshold of {{T}}, the bucket at 
 level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of 
 data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of 
 causing the bucket at level N to fill. If the bucket at level N fills, it 
 causes {{SsubN}} units to be compacted. So, for each active level in your 
 system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any 
 time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2281) keep a count of errors

2011-03-29 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: patch

update to trunk

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.5

 Attachments: patch, textmate stdin Vrj9Xa.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-08 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: (was: textmate stdin Vrj9Xa.txt)

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin Vrj9Xa.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-08 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: textmate stdin Vrj9Xa.txt

Fixes a minor bug caught by chrisg.

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin Vrj9Xa.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-08 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: (was: textmate stdin 5y2H5u.txt)

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin Vrj9Xa.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-08 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: textmate stdin Vrj9Xa.txt

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin Vrj9Xa.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)
keep a count of errors
--

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King


I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
useful for operators to keep track of the quality of cassandra without having 
to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: textmate stdin c4Hh5i.txt

Patch to keep track of errors and expose via JMX. I've probably missed a few 
places where we need to track things.

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
 Attachments: textmate stdin c4Hh5i.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: textmate stdin 5y2H5u.txt

fix some codestyle issues in ErrorReporter

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
 Attachments: textmate stdin 5y2H5u.txt, textmate stdin c4Hh5i.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: (was: textmate stdin 5y2H5u.txt)

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
 Attachments: textmate stdin 5y2H5u.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2281:
-

Attachment: textmate stdin 5y2H5u.txt

forgot to click the license on the last one.

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Ryan King
 Attachments: textmate stdin 5y2H5u.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003620#comment-13003620
 ] 

Ryan King commented on CASSANDRA-2281:
--

I think it might be better to go the opposite direction and have ErrorReporter 
do the logging.

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin 5y2H5u.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2281) keep a count of errors

2011-03-07 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003715#comment-13003715
 ] 

Ryan King commented on CASSANDRA-2281:
--

That's a good point. I'm not sure how I feel about conflating logging and 
statistics gathering.

 keep a count of errors
 --

 Key: CASSANDRA-2281
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2281
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Fix For: 0.7.4

 Attachments: textmate stdin 5y2H5u.txt


 I have patch that keeps a counter (exposed via jmx) of errors. This is quite 
 useful for operators to keep track of the quality of cassandra without having 
 to tail and parse logs across a cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2229) Back off compaction after failure

2011-02-23 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998511#comment-12998511
 ] 

Ryan King commented on CASSANDRA-2229:
--

It might be worth keeping track of the specific SSTables involved in the failed 
compaction and skipping those. Its possible we could make some progress on 
compaction in scenarios where a single sstable is corrupt.

 Back off compaction after failure
 -

 Key: CASSANDRA-2229
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2229
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.2
Reporter: Nick Bailey
Priority: Minor
 Fix For: 0.8


 When compaction fails (for one of the multitude of reasons it can fail, 
 generally some sort of 'corruption'), we should back off on attempting to 
 compact that column family.  Continuously trying to compact it will just 
 waste resources.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-1657) support in-memory column families

2011-02-01 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989473#comment-12989473
 ] 

Ryan King commented on CASSANDRA-1657:
--

For narrow SSTables, shouldn't the row cache be enough for this?

 support in-memory column families
 -

 Key: CASSANDRA-1657
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1657
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Priority: Minor

 Some workloads are such that you absolutely depend on column families being 
 in-memory for performance, yet you most definitely want all the things that 
 Cassandra offers in terms of replication, consistency, durability etc.
 In order to semi-deterministically ensure acceptable performance for such 
 data, Cassandra could support in-memory column families. Such an in-memory 
 column family would imply that mlock() be used on sstables for this column 
 family. On start-up and on compaction completion, they could be mmap():ed 
 with MAP_POPULATE (Linux specific) or else just mmap():ed + mlock():ed in 
 such a way as to otherwise guarantee it is in-memory (such as userland 
 traversal of the entire file).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2057) overflow in NodeCmd

2011-01-25 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-2057:
-

Attachment: nodetool_overflow.patch

 overflow in NodeCmd
 ---

 Key: CASSANDRA-2057
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2057
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Ryan King
Assignee: Ryan King
Priority: Minor
 Attachments: nodetool_overflow.patch


 We aggregate the long read/write counts across CFs into an int.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-01-24 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986185#action_12986185
 ] 

Ryan King commented on CASSANDRA-2045:
--

I think the two approaches are suitable for different kinds of data models. The 
point approach is almost certainly better for narrow rows, while worse for 
large, dynamic rows.

 Simplify HH to decrease read load when nodes come back
 --

 Key: CASSANDRA-2045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
 Fix For: 0.7.2


 Currently when HH is enabled, hints are stored, and when a node comes back, 
 we begin sending that node data. We do a lookup on the local node for the row 
 to send. To help reduce read load (if a node is offline for long period of 
 time) we should store the data we want forward the node locally instead. We 
 wouldn't have to do any lookups, just take byte[] and send to the destination.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1777) The describe_host API method is misleading in that it returns the interface associated with gossip traffic

2011-01-21 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984837#action_12984837
 ] 

Ryan King commented on CASSANDRA-1777:
--

I don't care about making it routing-aware. I just want to do discovery.

 The describe_host API method is misleading in that it returns the interface 
 associated with gossip traffic
 --

 Key: CASSANDRA-1777
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1777
 Project: Cassandra
  Issue Type: Bug
Reporter: Nate McCall
Assignee: Brandon Williams
 Fix For: 0.8

 Attachments: 1777.txt

   Original Estimate: 16h
  Remaining Estimate: 16h

 If the hardware is configured to use separate interfaces for thrift and 
 gossip, the gossip interface will be returned, given the results come out of 
 the ReplicationStrategy eventually.
 I understand the approach, but given this is on the API, it effective 
 worthless in situations of host auto discovery via describe_ring from a 
 client. I actually see this as the primary use case of this method - why else 
 would I care about the gossip iface from the client perspective? It's current 
 form should be relegated to JMX only. 
 At the same time, we should add port information as well. 
 describe_splits probably has similar issues.
 I see the potential cart-before-horse issues here and that this will probably 
 be non-trivial to fix, but I think give me a set of all the hosts to which I 
 can talk is pretty important from a client perspective.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1777) The describe_host API method is misleading in that it returns the interface associated with gossip traffic

2011-01-20 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984466#action_12984466
 ] 

Ryan King commented on CASSANDRA-1777:
--

Unless you have a dns server that can understand cassandra membership, RRDNS is 
actually a rough way to do this. I'd prefer to supply something for clients 
that works correctly.

 The describe_host API method is misleading in that it returns the interface 
 associated with gossip traffic
 --

 Key: CASSANDRA-1777
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1777
 Project: Cassandra
  Issue Type: Bug
Reporter: Nate McCall
Assignee: Brandon Williams
 Fix For: 0.8

 Attachments: 1777.txt

   Original Estimate: 16h
  Remaining Estimate: 16h

 If the hardware is configured to use separate interfaces for thrift and 
 gossip, the gossip interface will be returned, given the results come out of 
 the ReplicationStrategy eventually.
 I understand the approach, but given this is on the API, it effective 
 worthless in situations of host auto discovery via describe_ring from a 
 client. I actually see this as the primary use case of this method - why else 
 would I care about the gossip iface from the client perspective? It's current 
 form should be relegated to JMX only. 
 At the same time, we should add port information as well. 
 describe_splits probably has similar issues.
 I see the potential cart-before-horse issues here and that this will probably 
 be non-trivial to fix, but I think give me a set of all the hosts to which I 
 can talk is pretty important from a client perspective.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-18 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King resolved CASSANDRA-1932.
--

Resolution: Cannot Reproduce

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-18 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983509#action_12983509
 ] 

Ryan King commented on CASSANDRA-1932:
--

I'm prepared to consider this can't reproduce. I think it was user error. The 
fix to refuse opening future sstables should make that error clearer in the 
future.

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1935) Refuse to open SSTables from the future

2011-01-14 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981904#action_12981904
 ] 

Ryan King commented on CASSANDRA-1935:
--

Not that I can think of.

 Refuse to open SSTables from the future
 ---

 Key: CASSANDRA-1935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.8

 Attachments: CASSANDRA-1935.patch


 If somebody has rolled back to a previous version of Cassandra that is unable 
 to read an SSTable written by a future version correctly (indicated by a 
 version change), failing fast is safer than accidentally performing a 
 compaction that rewrites incorrect data and leaves you in an odd state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1983) Make sstable filenames contain a UUID instead of increasing integer

2011-01-13 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981507#action_12981507
 ] 

Ryan King commented on CASSANDRA-1983:
--

Alternatively, since we'll need a host-uuid mapping for counters we can put 
that uuid in the filename along with a serial integer (make it a long and we 
should be ok, right?)

 Make sstable filenames contain a UUID instead of increasing integer
 ---

 Key: CASSANDRA-1983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: David King
Priority: Minor

 sstable filenames look like CFName-1569-Index.db, containing an integer for 
 uniqueness. This makes it possible (however unlikely) that the integer could 
 overflow, which could be a problem. It also makes it difficult to collapse 
 multiple nodes into a single one with rsync. I do this occasionally for 
 testing: I'll copy our 20 node cluster into only 3 nodes by copying all of 
 the data files and running cleanup; at present this requires a manual step of 
 uniqifying the overlapping sstable names. If instead of an incrementing 
 integer, it would be handy if these contained a UUID or somesuch that 
 guarantees uniqueness across the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1935) Refuse to open SSTables from the future

2011-01-11 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1935:
-

Attachment: CASSANDRA-1935.patch

Here's the simplest patch that could work. I'm a bit afraid that this may cause 
problems in scenarios other than startup. Also, I'd appreciate feedback on a 
better exception to raise.

 Refuse to open SSTables from the future
 ---

 Key: CASSANDRA-1935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.8

 Attachments: CASSANDRA-1935.patch


 If somebody has rolled back to a previous version of Cassandra that is unable 
 to read an SSTable written by a future version correctly (indicated by a 
 version change), failing fast is safer than accidentally performing a 
 compaction that rewrites incorrect data and leaves you in an odd state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1935) Refuse to open SSTables from the future

2011-01-10 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979673#action_12979673
 ] 

Ryan King commented on CASSANDRA-1935:
--

That seems like a somewhat bigger change. Perhaps we could tackle the startup 
situation now and open another ticket for making sure we don't try to stream 
incompatible sstables?

 Refuse to open SSTables from the future
 ---

 Key: CASSANDRA-1935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.8


 If somebody has rolled back to a previous version of Cassandra that is unable 
 to read an SSTable written by a future version correctly (indicated by a 
 version change), failing fast is safer than accidentally performing a 
 compaction that rewrites incorrect data and leaves you in an odd state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1427) Optimize loadbalance/move for moves within the current range

2011-01-06 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978595#action_12978595
 ] 

Ryan King commented on CASSANDRA-1427:
--

I think we should generalize it to cover all cases.

 Optimize loadbalance/move for moves within the current range
 

 Key: CASSANDRA-1427
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1427
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 0.7 beta 1
Reporter: Nick Bailey
Assignee: Brandon Williams
 Fix For: 0.8


 Currently our move/loadbalance operations only implement case 2 of the Ruhl 
 algorithm described at 
 https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079.
 We should add functionality to optimize moves that take/give ranges to a 
 node's direct neighbors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1935) Refuse to open SSTables from the future

2011-01-05 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978029#action_12978029
 ] 

Ryan King commented on CASSANDRA-1935:
--

It seems like we should probably abort in this case, but that might be a bit 
draconian.

 Refuse to open SSTables from the future
 ---

 Key: CASSANDRA-1935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.8


 If somebody has rolled back to a previous version of Cassandra that is unable 
 to read an SSTable written by a future version correctly (indicated by a 
 version change), failing fast is safer than accidentally performing a 
 compaction that rewrites incorrect data and leaves you in an odd state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1935) Refuse to open SSTables from the future

2011-01-05 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978076#action_12978076
 ] 

Ryan King commented on CASSANDRA-1935:
--

What about scenarios outside startup, like streaming?

 Refuse to open SSTables from the future
 ---

 Key: CASSANDRA-1935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.8


 If somebody has rolled back to a previous version of Cassandra that is unable 
 to read an SSTable written by a future version correctly (indicated by a 
 version change), failing fast is safer than accidentally performing a 
 compaction that rewrites incorrect data and leaves you in an odd state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-04 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977458#action_12977458
 ] 

Ryan King commented on CASSANDRA-1932:
--

What were the file names of the SSTables you set aside?

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1859) distributed test harness

2011-01-03 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1859:
-

Attachment: 0003-add-a-test-for-one-writes-and-all-reads.txt

Add another test for writing with one and reading with all (the last of the 
strong consistency scenarios).

 distributed test harness
 

 Key: CASSANDRA-1859
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1859
 Project: Cassandra
  Issue Type: Test
  Components: Tools
Reporter: Kelvin Kakugawa
Assignee: Kelvin Kakugawa
 Fix For: 0.8

 Attachments: 
 0001-Add-distributed-ultra-long-running-tests-using-Whirr-j.txt, 
 0002-Pull-whirr-0.3.0-incubating-SNAPSHOT-155-from-Twitter-.txt, 
 0003-add-a-test-for-one-writes-and-all-reads.txt


 Distributed Test Harness
 - deploys a cluster on a cloud provider
 - runs tests targeted at the cluster
 - tears down the cluster

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1015) Internal Messaging should be backwards compatible

2010-12-30 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976157#action_12976157
 ] 

Ryan King commented on CASSANDRA-1015:
--

I fear that in going on own way we'll end up replicating a lot of what's 
already been done in these frameworks.

Additionally, we make it much harder to to write code to comprehend the message 
in another language. I know this sounds like a YAGNI, but I've found it quite 
nice to be able to decode thrift RPC interchanges that are captured via tcpdump.

We have to rebuild a lot if we go our own way.

 Internal Messaging should be backwards compatible
 -

 Key: CASSANDRA-1015
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1015
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ryan King
Assignee: Gary Dusbabek
Priority: Critical
 Fix For: 0.8


 Currently, incompatible changes in the node-to-node communication prevent 
 rolling restarts of clusters.
 In order to fix this we should:
 1) use a framework that makes doing compatible changes easy
 2) have a policy of only making compatible changes between versions n and n+1*
 * Running multiple versions should only be supported for small periods of 
 time. Running clusters of mixed version is not needed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1072) Increment counters

2010-12-20 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973335#action_12973335
 ] 

Ryan King commented on CASSANDRA-1072:
--

Committing to trunk seems like a reasonable approach. Hopefully we can 
successfully backport to 0.7 then.

 Increment counters
 --

 Key: CASSANDRA-1072
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1072
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Johan Oskarsson
Assignee: Kelvin Kakugawa
 Attachments: CASSANDRA-1072.121710.2.patch, increment_test.py, 
 Partitionedcountersdesigndoc.pdf


 Break out the increment counters out of CASSANDRA-580. Classes are shared 
 between the two features but without the plain version vector code the 
 changeset becomes smaller and more manageable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded

2010-12-10 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970244#action_12970244
 ] 

Ryan King commented on CASSANDRA-1083:
--

I agree. I think this idea is mostly a dead end because its attacking the 
problem from the wrong direction.

 Improvement to CompactionManger's submitMinorIfNeeded
 -

 Key: CASSANDRA-1083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 0.7.1

 Attachments: 1083-configurable-compaction-thresholds.patch, 
 1083-sort.txt, compaction_simulation.rb, compaction_simulation.rb


 We've discovered that we are unable to tune compaction the way we want for 
 our production cluster. I think the current algorithm doesn't do this as well 
 as it could, since it doesn't sort the sstables by size before doing the 
 bucketing, which means the tuning parameters have unpredictable results.
 I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
 proposal:
 config operations:
  minimumCompactionThreshold
  maximumCompactionThreshold
  targetSSTableCount
 The first two would mean what they currently mean: the bounds on how many 
 sstables to compact in one compaction operation. The 3rd is a target for how 
 many SSTables you'd like to have.
 Pseudo code algorithm for determining whether or not to do a minor compaction:
 {noformat} 
 if sstables.length + minimumCompactionThreshold -1  targetSSTableCount
   sort sstables from smallest to largest
   compact the up to maximumCompactionThreshold smallest tables
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded

2010-12-09 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969829#action_12969829
 ] 

Ryan King commented on CASSANDRA-1083:
--

To be honest, I'm not sure this is the best approach anymore. I think the 
fundamental problem is that its driven by the write traffic, not the read 
traffic.

 Improvement to CompactionManger's submitMinorIfNeeded
 -

 Key: CASSANDRA-1083
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ryan King
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 0.7.1

 Attachments: 1083-configurable-compaction-thresholds.patch, 
 compaction_simulation.rb, compaction_simulation.rb


 We've discovered that we are unable to tune compaction the way we want for 
 our production cluster. I think the current algorithm doesn't do this as well 
 as it could, since it doesn't sort the sstables by size before doing the 
 bucketing, which means the tuning parameters have unpredictable results.
 I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
 proposal:
 config operations:
  minimumCompactionThreshold
  maximumCompactionThreshold
  targetSSTableCount
 The first two would mean what they currently mean: the bounds on how many 
 sstables to compact in one compaction operation. The 3rd is a target for how 
 many SSTables you'd like to have.
 Pseudo code algorithm for determining whether or not to do a minor compaction:
 {noformat} 
 if sstables.length + minimumCompactionThreshold -1  targetSSTableCount
   sort sstables from smallest to largest
   compact the up to maximumCompactionThreshold smallest tables
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1555) Considerations for larger bloom filters

2010-12-07 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968861#action_12968861
 ] 

Ryan King commented on CASSANDRA-1555:
--

Stu's last patch is incorporated (in spirit, I took a slightly different 
appraoch) in my latest.

 Considerations for larger bloom filters
 ---

 Key: CASSANDRA-1555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1555
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Ryan King
 Fix For: 0.8

 Attachments: 1555_v5.txt, addendum-to-1555.txt, cassandra-1555.tgz, 
 CASSANDRA-1555v2.patch, CASSANDRA-1555v3.patch.gz, CASSANDRA-1555v4.patch.gz


 To (optimally) support SSTables larger than 143 million keys, we need to 
 support bloom filters larger than 2^31 bits, which java.util.BitSet can't 
 handle directly.
 A few options:
 * Switch to a BitSet class which supports 2^31 * 64 bits (Lucene's OpenBitSet)
 * Partition the java.util.BitSet behind our current BloomFilter
 ** Straightforward bit partitioning: bit N is in bitset N // 2^31
 ** Separate equally sized complete bloom filters for member ranges, which can 
 be used independently or OR'd together under memory pressure.
 All of these options require new approaches to serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1555) Considerations for larger bloom filters

2010-12-06 Thread Ryan King (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1555:
-

Attachment: CASSANDRA-1555v3.patch.gz

New patch with several changes based on Stu's feedback:

* renamed BloomFilter to LegacyBloomFilter and BigBloomFilter to BloomFilter
* moved maxBucketsPerElement to BloomCalculations
* removed emptybuckets
* cleaned up formatting in SSTableReader and BigBloomFilter

Finally I changed the serialization to read and write the long[] directly, 
which saves a lot of spaces for small filters (column filter for a 10 item row 
goes from 120 bytes to 16).

 Considerations for larger bloom filters
 ---

 Key: CASSANDRA-1555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1555
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Ryan King
 Fix For: 0.8

 Attachments: cassandra-1555.tgz, CASSANDRA-1555v2.patch, 
 CASSANDRA-1555v3.patch.gz


 To (optimally) support SSTables larger than 143 million keys, we need to 
 support bloom filters larger than 2^31 bits, which java.util.BitSet can't 
 handle directly.
 A few options:
 * Switch to a BitSet class which supports 2^31 * 64 bits (Lucene's OpenBitSet)
 * Partition the java.util.BitSet behind our current BloomFilter
 ** Straightforward bit partitioning: bit N is in bitset N // 2^31
 ** Separate equally sized complete bloom filters for member ranges, which can 
 be used independently or OR'd together under memory pressure.
 All of these options require new approaches to serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >