Re: roadmap and 0.4
> Am I missing anything important? Option to allow fsync on the commitlog before ack-ing the client. On Tue, Jul 28, 2009 at 2:05 PM, Jonathan Ellis wrote: > Tentative changelog for 0.4 is being worked on over in > https://issues.apache.org/jira/browse/CASSANDRA-321. For convenience, > here is the text so far: > > * On-disk data format has changed to allow billions of keys/rows per > node instead of only millions > * Scan all sstables for all queries to avoid situations where > different types of operation on the same ColumnFamily could > disagree on what data was present > * Configurable LRU cache for key lookups > * Multi-keyspace support > * Thrift API has changed a _lot_: > - removed time-sorted CFs; instead, user-defined comparators > may be defined on the column names, which are now byte arrays. > Default comparators are provided for UTF8, Bytes, Ascii, Long (i64), > and UUID types. > - removed colon-delimited strings in thrift api in favor of explicit > structs such as ColumnPath, ColumnParent, etc. Also normalized > thrift struct and argument naming. > - Added columnFamily argument to get_key_range. > - Change signature of get_slice and get_slice_super to accept > starting and ending columns as well as an offset. (This allows use > of indexes.) Added "ascending" flag to allow reasonably-efficient > reverse scans as well. Removed get_slice_by_range as redundant. > - Similarly, changed signature of get_slice_super. > - get_key_range operates on one CF at a time > - changed `block` boolean on insert methods to ConsistencyLevel enum, > with options of NONE, ONE, QUORUM, and ALL. > - added similar consistency_level parameter to read methods > * Removed the web interface. Node information can now be obtained by > using the newly introduced nodeprobe utility. > * More JMX stats > * Remove magic values from internals (e.g. special key to indicate > when to flush memtables) > * Rename configuration "table" to "keyspace" > * Moved to crash-only design; no more shutdown (just kill the process) > * Lots of bug fixes > > Am I missing anything important? > > -Jonathan >
Re: roadmap and 0.4
Tentative changelog for 0.4 is being worked on over in https://issues.apache.org/jira/browse/CASSANDRA-321. For convenience, here is the text so far: * On-disk data format has changed to allow billions of keys/rows per node instead of only millions * Scan all sstables for all queries to avoid situations where different types of operation on the same ColumnFamily could disagree on what data was present * Configurable LRU cache for key lookups * Multi-keyspace support * Thrift API has changed a _lot_: - removed time-sorted CFs; instead, user-defined comparators may be defined on the column names, which are now byte arrays. Default comparators are provided for UTF8, Bytes, Ascii, Long (i64), and UUID types. - removed colon-delimited strings in thrift api in favor of explicit structs such as ColumnPath, ColumnParent, etc. Also normalized thrift struct and argument naming. - Added columnFamily argument to get_key_range. - Change signature of get_slice and get_slice_super to accept starting and ending columns as well as an offset. (This allows use of indexes.) Added "ascending" flag to allow reasonably-efficient reverse scans as well. Removed get_slice_by_range as redundant. - Similarly, changed signature of get_slice_super. - get_key_range operates on one CF at a time - changed `block` boolean on insert methods to ConsistencyLevel enum, with options of NONE, ONE, QUORUM, and ALL. - added similar consistency_level parameter to read methods * Removed the web interface. Node information can now be obtained by using the newly introduced nodeprobe utility. * More JMX stats * Remove magic values from internals (e.g. special key to indicate when to flush memtables) * Rename configuration "table" to "keyspace" * Moved to crash-only design; no more shutdown (just kill the process) * Lots of bug fixes Am I missing anything important? -Jonathan
Re: roadmap and 0.4
Also possibly of interest: issues fixed in 0.4 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&&pid=12310865&fixfor=12313862&resolution=1&sorter/field=issuekey&sorter/order=DESC On Tue, Jul 21, 2009 at 6:05 PM, Jonathan Ellis wrote: > To recap: our mission with 0.3, as I see it, was to add features that > allow people to start modeling their app correctly on cassandra (range > queries and delete support) and file off the worst of the rough edges > from the initial code import. Mission accomplished. > > But 0.3 is already obsolete in a lot of ways so I think we need to > follow up with a relatively quick 0.4 so people don't write too much > code against an obsolete API or put too much data in a disk format > that has changed. > > Here are the issues I think remain for 0.4: > > https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313862 > > I think we can get all these done in a couple weeks, no problem. > Bootstrap is the only one that might be problematic and I am willing > to push that to 0.5 if it looks *really* bad. > > Any other Gotta Have tickets for 0.4? > > -Jonathan >
roadmap and 0.4
To recap: our mission with 0.3, as I see it, was to add features that allow people to start modeling their app correctly on cassandra (range queries and delete support) and file off the worst of the rough edges from the initial code import. Mission accomplished. But 0.3 is already obsolete in a lot of ways so I think we need to follow up with a relatively quick 0.4 so people don't write too much code against an obsolete API or put too much data in a disk format that has changed. Here are the issues I think remain for 0.4: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313862 I think we can get all these done in a couple weeks, no problem. Bootstrap is the only one that might be problematic and I am willing to push that to 0.5 if it looks *really* bad. Any other Gotta Have tickets for 0.4? -Jonathan
Re: Roadmap
Range queries isn't going to block us. (The code is already written; I just need to rebase it and I'm waiting on #65 for that.) But in principle I agree. -Jonathan On Apr 16, 2009, at 1:42 AM, Per Mellqvist wrote: Great to see a target for a release! Personally I think the momentum of the project would benefit more from having a release to refer to, than any (other) new feature or improvement. I understand range queries are a priority for you Jonathan. I still wonder if it would not be better to limit 0.3 to only bug fixes (priority major or above)? // Per On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis wrote: I went all Enterprise on our jira and assigned issues to version "0.3" that I'd like to get done in the relatively near future for our first official release. The list of issues is here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861 Note that many issues are marked Patch Available which means we just need to complete the review process for those. If you want to grab one of the unassigned ones that would be awesome. If you want to grab one of the ones I assigned to myself, that's awesome too, but give me a heads up first so I don't duplicate your effort. :) Also, if there's other issues that you think should be on the 0.3 list feel free to add them. (Correctness issues especially.) But IMO we should not let scope creep too much for our first Apache release. -Jonathan On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis wrote: Someone asked on IRC if there is a roadmap for Cassandra. This is a good discussion to have. :) Personally my priority list looks like this: High priority: 1. range queries [which requires the partitioner changes we've been discussing] 2. make cassandra not allow itself to run out of memory during sustained inserts 3. fix distributed remove issues 4. Support unicode keys Medium priority: 5. pre-emptive repair (what the dynamo paper calls anti-entropy) 6. load balancing (1) is substantially done but will probably need some tweaking during code review. And then the client api will probably need some fleshing out (right now you just get a list of keys back, so that's not very efficient if you want to get columns for each of those too.) (2) has workarounds like binarymemtable but I'd really like to get the main insert path able to handle large insert volume without falling over. My co-worker is just starting to look into this. I'm hoping there will be some straightforward improvements to make here. I outlined an approach to (3) that I think will work here: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e I'm waiting for Avinash's feedback but as outlined it is not much code. (4) is a thrift issue, not Cassandra per se. (see https://issues.apache.org/jira/browse/THRIFT-395) but it is on my plate so I thought I'd throw that out there. I have not started (5) or (6). There are some stubs for load balancing in the code which is why I said in another thread that the Facebook developers have probably thought more about this. I know Avinash is currently finishing up multiget support. Hopefully he will chime in about what his and Prashant's plans are next. -Jonathan
Re: Roadmap
I don't think it's a problem to squeeze in some new features as well, but I feel we should set a feature freeze date for 0.3 so that we know when to stop. Perhaps a couple of months from now? After that date trunk would be branched into 0.3 and all non blocking issues would be moved from 0.3 to 0.4 in Jira. Then we'd fix the remaining blocking bugs and roll a release candidate. /Johan Per Mellqvist wrote: Great to see a target for a release! Personally I think the momentum of the project would benefit more from having a release to refer to, than any (other) new feature or improvement. I understand range queries are a priority for you Jonathan. I still wonder if it would not be better to limit 0.3 to only bug fixes (priority major or above)? // Per On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis wrote: I went all Enterprise on our jira and assigned issues to version "0.3" that I'd like to get done in the relatively near future for our first official release. The list of issues is here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861 Note that many issues are marked Patch Available which means we just need to complete the review process for those. If you want to grab one of the unassigned ones that would be awesome. If you want to grab one of the ones I assigned to myself, that's awesome too, but give me a heads up first so I don't duplicate your effort. :) Also, if there's other issues that you think should be on the 0.3 list feel free to add them. (Correctness issues especially.) But IMO we should not let scope creep too much for our first Apache release. -Jonathan On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis wrote: Someone asked on IRC if there is a roadmap for Cassandra. This is a good discussion to have. :) Personally my priority list looks like this: High priority: 1. range queries [which requires the partitioner changes we've been discussing] 2. make cassandra not allow itself to run out of memory during sustained inserts 3. fix distributed remove issues 4. Support unicode keys Medium priority: 5. pre-emptive repair (what the dynamo paper calls anti-entropy) 6. load balancing (1) is substantially done but will probably need some tweaking during code review. And then the client api will probably need some fleshing out (right now you just get a list of keys back, so that's not very efficient if you want to get columns for each of those too.) (2) has workarounds like binarymemtable but I'd really like to get the main insert path able to handle large insert volume without falling over. My co-worker is just starting to look into this. I'm hoping there will be some straightforward improvements to make here. I outlined an approach to (3) that I think will work here: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e I'm waiting for Avinash's feedback but as outlined it is not much code. (4) is a thrift issue, not Cassandra per se. (see https://issues.apache.org/jira/browse/THRIFT-395) but it is on my plate so I thought I'd throw that out there. I have not started (5) or (6). There are some stubs for load balancing in the code which is why I said in another thread that the Facebook developers have probably thought more about this. I know Avinash is currently finishing up multiget support. Hopefully he will chime in about what his and Prashant's plans are next. -Jonathan
Re: Roadmap
Great to see a target for a release! Personally I think the momentum of the project would benefit more from having a release to refer to, than any (other) new feature or improvement. I understand range queries are a priority for you Jonathan. I still wonder if it would not be better to limit 0.3 to only bug fixes (priority major or above)? // Per On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis wrote: > I went all Enterprise on our jira and assigned issues to version "0.3" > that I'd like to get done in the relatively near future for our first > official release. > > The list of issues is here: > https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861 > > Note that many issues are marked Patch Available which means we just > need to complete the review process for those. > > If you want to grab one of the unassigned ones that would be awesome. > If you want to grab one of the ones I assigned to myself, that's > awesome too, but give me a heads up first so I don't duplicate your > effort. :) > > Also, if there's other issues that you think should be on the 0.3 list > feel free to add them. (Correctness issues especially.) But IMO we > should not let scope creep too much for our first Apache release. > > -Jonathan > > On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis wrote: >> Someone asked on IRC if there is a roadmap for Cassandra. This is a >> good discussion to have. :) >> >> Personally my priority list looks like this: >> >> High priority: >> 1. range queries [which requires the partitioner changes we've been >> discussing] >> 2. make cassandra not allow itself to run out of memory during >> sustained inserts >> 3. fix distributed remove issues >> 4. Support unicode keys >> >> Medium priority: >> 5. pre-emptive repair (what the dynamo paper calls anti-entropy) >> 6. load balancing >> >> (1) is substantially done but will probably need some tweaking during >> code review. And then the client api will probably need some fleshing >> out (right now you just get a list of keys back, so that's not very >> efficient if you want to get columns for each of those too.) >> >> (2) has workarounds like binarymemtable but I'd really like to get the >> main insert path able to handle large insert volume without falling >> over. My co-worker is just starting to look into this. I'm hoping >> there will be some straightforward improvements to make here. >> >> I outlined an approach to (3) that I think will work here: >> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e >> >> I'm waiting for Avinash's feedback but as outlined it is not much code. >> >> (4) is a thrift issue, not Cassandra per se. (see >> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my >> plate so I thought I'd throw that out there. >> >> I have not started (5) or (6). There are some stubs for load >> balancing in the code which is why I said in another thread that the >> Facebook developers have probably thought more about this. >> >> I know Avinash is currently finishing up multiget support. Hopefully >> he will chime in about what his and Prashant's plans are next. >> >> -Jonathan >> >
Re: Roadmap
I went all Enterprise on our jira and assigned issues to version "0.3" that I'd like to get done in the relatively near future for our first official release. The list of issues is here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861 Note that many issues are marked Patch Available which means we just need to complete the review process for those. If you want to grab one of the unassigned ones that would be awesome. If you want to grab one of the ones I assigned to myself, that's awesome too, but give me a heads up first so I don't duplicate your effort. :) Also, if there's other issues that you think should be on the 0.3 list feel free to add them. (Correctness issues especially.) But IMO we should not let scope creep too much for our first Apache release. -Jonathan On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis wrote: > Someone asked on IRC if there is a roadmap for Cassandra. This is a > good discussion to have. :) > > Personally my priority list looks like this: > > High priority: > 1. range queries [which requires the partitioner changes we've been > discussing] > 2. make cassandra not allow itself to run out of memory during > sustained inserts > 3. fix distributed remove issues > 4. Support unicode keys > > Medium priority: > 5. pre-emptive repair (what the dynamo paper calls anti-entropy) > 6. load balancing > > (1) is substantially done but will probably need some tweaking during > code review. And then the client api will probably need some fleshing > out (right now you just get a list of keys back, so that's not very > efficient if you want to get columns for each of those too.) > > (2) has workarounds like binarymemtable but I'd really like to get the > main insert path able to handle large insert volume without falling > over. My co-worker is just starting to look into this. I'm hoping > there will be some straightforward improvements to make here. > > I outlined an approach to (3) that I think will work here: > http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e > > I'm waiting for Avinash's feedback but as outlined it is not much code. > > (4) is a thrift issue, not Cassandra per se. (see > https://issues.apache.org/jira/browse/THRIFT-395) but it is on my > plate so I thought I'd throw that out there. > > I have not started (5) or (6). There are some stubs for load > balancing in the code which is why I said in another thread that the > Facebook developers have probably thought more about this. > > I know Avinash is currently finishing up multiget support. Hopefully > he will chime in about what his and Prashant's plans are next. > > -Jonathan >
Roadmap
Someone asked on IRC if there is a roadmap for Cassandra. This is a good discussion to have. :) Personally my priority list looks like this: High priority: 1. range queries [which requires the partitioner changes we've been discussing] 2. make cassandra not allow itself to run out of memory during sustained inserts 3. fix distributed remove issues 4. Support unicode keys Medium priority: 5. pre-emptive repair (what the dynamo paper calls anti-entropy) 6. load balancing (1) is substantially done but will probably need some tweaking during code review. And then the client api will probably need some fleshing out (right now you just get a list of keys back, so that's not very efficient if you want to get columns for each of those too.) (2) has workarounds like binarymemtable but I'd really like to get the main insert path able to handle large insert volume without falling over. My co-worker is just starting to look into this. I'm hoping there will be some straightforward improvements to make here. I outlined an approach to (3) that I think will work here: http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e I'm waiting for Avinash's feedback but as outlined it is not much code. (4) is a thrift issue, not Cassandra per se. (see https://issues.apache.org/jira/browse/THRIFT-395) but it is on my plate so I thought I'd throw that out there. I have not started (5) or (6). There are some stubs for load balancing in the code which is why I said in another thread that the Facebook developers have probably thought more about this. I know Avinash is currently finishing up multiget support. Hopefully he will chime in about what his and Prashant's plans are next. -Jonathan