Re: roadmap and 0.4

2009-07-28 Thread Sandeep Tata
> Am I missing anything important?

Option to allow fsync on the commitlog before ack-ing the client.

On Tue, Jul 28, 2009 at 2:05 PM, Jonathan Ellis wrote:
> Tentative changelog for 0.4 is being worked on over in
> https://issues.apache.org/jira/browse/CASSANDRA-321.  For convenience,
> here is the text so far:
>
>  * On-disk data format has changed to allow billions of keys/rows per
>   node instead of only millions
>  * Scan all sstables for all queries to avoid situations where
>   different types of operation on the same ColumnFamily could
>   disagree on what data was present
>  * Configurable LRU cache for key lookups
>  * Multi-keyspace support
>  * Thrift API has changed a _lot_:
>    - removed time-sorted CFs; instead, user-defined comparators
>      may be defined on the column names, which are now byte arrays.
>      Default comparators are provided for UTF8, Bytes, Ascii, Long (i64),
>      and UUID types.
>    - removed colon-delimited strings in thrift api in favor of explicit
>      structs such as ColumnPath, ColumnParent, etc.  Also normalized
>      thrift struct and argument naming.
>    - Added columnFamily argument to get_key_range.
>    - Change signature of get_slice and get_slice_super to accept
>      starting and ending columns as well as an offset.  (This allows use
>      of indexes.)  Added "ascending" flag to allow reasonably-efficient
>      reverse scans as well.  Removed get_slice_by_range as redundant.
>    - Similarly, changed signature of get_slice_super.
>    - get_key_range operates on one CF at a time
>    - changed `block` boolean on insert methods to ConsistencyLevel enum,
>      with options of NONE, ONE, QUORUM, and ALL.
>    - added similar consistency_level parameter to read methods
>  * Removed the web interface. Node information can now be obtained by
>   using the newly introduced nodeprobe utility.
>  * More JMX stats
>  * Remove magic values from internals (e.g. special key to indicate
>   when to flush memtables)
>  * Rename configuration "table" to "keyspace"
>  * Moved to crash-only design; no more shutdown (just kill the process)
>  * Lots of bug fixes
>
> Am I missing anything important?
>
> -Jonathan
>


Re: roadmap and 0.4

2009-07-28 Thread Jonathan Ellis
Tentative changelog for 0.4 is being worked on over in
https://issues.apache.org/jira/browse/CASSANDRA-321.  For convenience,
here is the text so far:

 * On-disk data format has changed to allow billions of keys/rows per
   node instead of only millions
 * Scan all sstables for all queries to avoid situations where
   different types of operation on the same ColumnFamily could
   disagree on what data was present
 * Configurable LRU cache for key lookups
 * Multi-keyspace support
 * Thrift API has changed a _lot_:
- removed time-sorted CFs; instead, user-defined comparators
  may be defined on the column names, which are now byte arrays.
  Default comparators are provided for UTF8, Bytes, Ascii, Long (i64),
  and UUID types.
- removed colon-delimited strings in thrift api in favor of explicit
  structs such as ColumnPath, ColumnParent, etc.  Also normalized
  thrift struct and argument naming.
- Added columnFamily argument to get_key_range.
- Change signature of get_slice and get_slice_super to accept
  starting and ending columns as well as an offset.  (This allows use
  of indexes.)  Added "ascending" flag to allow reasonably-efficient
  reverse scans as well.  Removed get_slice_by_range as redundant.
- Similarly, changed signature of get_slice_super.
- get_key_range operates on one CF at a time
- changed `block` boolean on insert methods to ConsistencyLevel enum,
  with options of NONE, ONE, QUORUM, and ALL.
- added similar consistency_level parameter to read methods
 * Removed the web interface. Node information can now be obtained by
   using the newly introduced nodeprobe utility.
 * More JMX stats
 * Remove magic values from internals (e.g. special key to indicate
   when to flush memtables)
 * Rename configuration "table" to "keyspace"
 * Moved to crash-only design; no more shutdown (just kill the process)
 * Lots of bug fixes

Am I missing anything important?

-Jonathan


Re: roadmap and 0.4

2009-07-22 Thread Jonathan Ellis
Also possibly of interest: issues fixed in 0.4

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&&pid=12310865&fixfor=12313862&resolution=1&sorter/field=issuekey&sorter/order=DESC

On Tue, Jul 21, 2009 at 6:05 PM, Jonathan Ellis wrote:
> To recap: our mission with 0.3, as I see it, was to add features that
> allow people to start modeling their app correctly on cassandra (range
> queries and delete support) and file off the worst of the rough edges
> from the initial code import.  Mission accomplished.
>
> But 0.3 is already obsolete in a lot of ways so I think we need to
> follow up with a relatively quick 0.4 so people don't write too much
> code against an obsolete API or put too much data in a disk format
> that has changed.
>
> Here are the issues I think remain for 0.4:
>
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313862
>
> I think we can get all these done in a couple weeks, no problem.
> Bootstrap is the only one that might be problematic and I am willing
> to push that to 0.5 if it looks *really* bad.
>
> Any other Gotta Have tickets for 0.4?
>
> -Jonathan
>


roadmap and 0.4

2009-07-21 Thread Jonathan Ellis
To recap: our mission with 0.3, as I see it, was to add features that
allow people to start modeling their app correctly on cassandra (range
queries and delete support) and file off the worst of the rough edges
from the initial code import.  Mission accomplished.

But 0.3 is already obsolete in a lot of ways so I think we need to
follow up with a relatively quick 0.4 so people don't write too much
code against an obsolete API or put too much data in a disk format
that has changed.

Here are the issues I think remain for 0.4:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313862

I think we can get all these done in a couple weeks, no problem.
Bootstrap is the only one that might be problematic and I am willing
to push that to 0.5 if it looks *really* bad.

Any other Gotta Have tickets for 0.4?

-Jonathan


Re: Roadmap

2009-04-16 Thread Jonathan ellis
Range queries isn't going to block us.  (The code is already written;  
I just need to rebase it and I'm waiting on #65 for that.)


But in principle I agree.

-Jonathan

On Apr 16, 2009, at 1:42 AM, Per Mellqvist  wrote:


Great to see a target for a release!

Personally I think the momentum of the project would benefit more from
having a release to refer to, than any (other) new feature or
improvement. I understand range queries are a priority for you
Jonathan. I still wonder if it would not be better to limit 0.3 to
only bug fixes (priority major or above)?

// Per

On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis   
wrote:
I went all Enterprise on our jira and assigned issues to version  
"0.3"

that I'd like to get done in the relatively near future for our first
official release.

The list of issues is here:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861

Note that many issues are marked Patch Available which means we just
need to complete the review process for those.

If you want to grab one of the unassigned ones that would be awesome.
If you want to grab one of the ones I assigned to myself, that's
awesome too, but give me a heads up first so I don't duplicate your
effort. :)

Also, if there's other issues that you think should be on the 0.3  
list

feel free to add them.  (Correctness issues especially.)  But IMO we
should not let scope creep too much for our first Apache release.

-Jonathan

On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis   
wrote:

Someone asked on IRC if there is a roadmap for Cassandra.  This is a
good discussion to have. :)

Personally my priority list looks like this:

High priority:
 1. range queries [which requires the partitioner changes we've  
been discussing]

 2. make cassandra not allow itself to run out of memory during
sustained inserts
 3. fix distributed remove issues
 4. Support unicode keys

Medium priority:
 5. pre-emptive repair (what the dynamo paper calls anti-entropy)
 6. load balancing

(1) is substantially done but will probably need some tweaking  
during
code review.  And then the client api will probably need some  
fleshing

out (right now you just get a list of keys back, so that's not very
efficient if you want to get columns for each of those too.)

(2) has workarounds like binarymemtable but I'd really like to get  
the

main insert path able to handle large insert volume without falling
over.  My co-worker is just starting to look into this.  I'm hoping
there will be some straightforward improvements to make here.

I outlined an approach to (3) that I think will work here:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e

I'm waiting for Avinash's feedback but as outlined it is not much  
code.


(4) is a thrift issue, not Cassandra per se.  (see
https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
plate so I thought I'd throw that out there.

I have not started (5) or (6).  There are some stubs for load
balancing in the code which is why I said in another thread that the
Facebook developers have probably thought more about this.

I know Avinash is currently finishing up multiget support.   
Hopefully

he will chime in about what his and Prashant's plans are next.

-Jonathan





Re: Roadmap

2009-04-16 Thread Johan Oskarsson
I don't think it's a problem to squeeze in some new features as well, 
but I feel we should set a feature freeze date for 0.3 so that we know 
when to stop. Perhaps a couple of months from now?


After that date trunk would be branched into 0.3 and all non blocking 
issues would be moved from 0.3 to 0.4 in Jira. Then we'd fix the 
remaining blocking bugs and roll a release candidate.


/Johan

Per Mellqvist wrote:

Great to see a target for a release!

Personally I think the momentum of the project would benefit more from
having a release to refer to, than any (other) new feature or
improvement. I understand range queries are a priority for you
Jonathan. I still wonder if it would not be better to limit 0.3 to
only bug fixes (priority major or above)?

// Per

On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis  wrote:

I went all Enterprise on our jira and assigned issues to version "0.3"
that I'd like to get done in the relatively near future for our first
official release.

The list of issues is here:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861

Note that many issues are marked Patch Available which means we just
need to complete the review process for those.

If you want to grab one of the unassigned ones that would be awesome.
If you want to grab one of the ones I assigned to myself, that's
awesome too, but give me a heads up first so I don't duplicate your
effort. :)

Also, if there's other issues that you think should be on the 0.3 list
feel free to add them.  (Correctness issues especially.)  But IMO we
should not let scope creep too much for our first Apache release.

-Jonathan

On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis  wrote:

Someone asked on IRC if there is a roadmap for Cassandra.  This is a
good discussion to have. :)

Personally my priority list looks like this:

High priority:
 1. range queries [which requires the partitioner changes we've been discussing]
 2. make cassandra not allow itself to run out of memory during
sustained inserts
 3. fix distributed remove issues
 4. Support unicode keys

Medium priority:
 5. pre-emptive repair (what the dynamo paper calls anti-entropy)
 6. load balancing

(1) is substantially done but will probably need some tweaking during
code review.  And then the client api will probably need some fleshing
out (right now you just get a list of keys back, so that's not very
efficient if you want to get columns for each of those too.)

(2) has workarounds like binarymemtable but I'd really like to get the
main insert path able to handle large insert volume without falling
over.  My co-worker is just starting to look into this.  I'm hoping
there will be some straightforward improvements to make here.

I outlined an approach to (3) that I think will work here:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e

I'm waiting for Avinash's feedback but as outlined it is not much code.

(4) is a thrift issue, not Cassandra per se.  (see
https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
plate so I thought I'd throw that out there.

I have not started (5) or (6).  There are some stubs for load
balancing in the code which is why I said in another thread that the
Facebook developers have probably thought more about this.

I know Avinash is currently finishing up multiget support.  Hopefully
he will chime in about what his and Prashant's plans are next.

-Jonathan





Re: Roadmap

2009-04-15 Thread Per Mellqvist
Great to see a target for a release!

Personally I think the momentum of the project would benefit more from
having a release to refer to, than any (other) new feature or
improvement. I understand range queries are a priority for you
Jonathan. I still wonder if it would not be better to limit 0.3 to
only bug fixes (priority major or above)?

// Per

On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis  wrote:
> I went all Enterprise on our jira and assigned issues to version "0.3"
> that I'd like to get done in the relatively near future for our first
> official release.
>
> The list of issues is here:
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861
>
> Note that many issues are marked Patch Available which means we just
> need to complete the review process for those.
>
> If you want to grab one of the unassigned ones that would be awesome.
> If you want to grab one of the ones I assigned to myself, that's
> awesome too, but give me a heads up first so I don't duplicate your
> effort. :)
>
> Also, if there's other issues that you think should be on the 0.3 list
> feel free to add them.  (Correctness issues especially.)  But IMO we
> should not let scope creep too much for our first Apache release.
>
> -Jonathan
>
> On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis  wrote:
>> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
>> good discussion to have. :)
>>
>> Personally my priority list looks like this:
>>
>> High priority:
>>  1. range queries [which requires the partitioner changes we've been 
>> discussing]
>>  2. make cassandra not allow itself to run out of memory during
>> sustained inserts
>>  3. fix distributed remove issues
>>  4. Support unicode keys
>>
>> Medium priority:
>>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>>  6. load balancing
>>
>> (1) is substantially done but will probably need some tweaking during
>> code review.  And then the client api will probably need some fleshing
>> out (right now you just get a list of keys back, so that's not very
>> efficient if you want to get columns for each of those too.)
>>
>> (2) has workarounds like binarymemtable but I'd really like to get the
>> main insert path able to handle large insert volume without falling
>> over.  My co-worker is just starting to look into this.  I'm hoping
>> there will be some straightforward improvements to make here.
>>
>> I outlined an approach to (3) that I think will work here:
>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e
>>
>> I'm waiting for Avinash's feedback but as outlined it is not much code.
>>
>> (4) is a thrift issue, not Cassandra per se.  (see
>> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
>> plate so I thought I'd throw that out there.
>>
>> I have not started (5) or (6).  There are some stubs for load
>> balancing in the code which is why I said in another thread that the
>> Facebook developers have probably thought more about this.
>>
>> I know Avinash is currently finishing up multiget support.  Hopefully
>> he will chime in about what his and Prashant's plans are next.
>>
>> -Jonathan
>>
>


Re: Roadmap

2009-04-15 Thread Jonathan Ellis
I went all Enterprise on our jira and assigned issues to version "0.3"
that I'd like to get done in the relatively near future for our first
official release.

The list of issues is here:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861

Note that many issues are marked Patch Available which means we just
need to complete the review process for those.

If you want to grab one of the unassigned ones that would be awesome.
If you want to grab one of the ones I assigned to myself, that's
awesome too, but give me a heads up first so I don't duplicate your
effort. :)

Also, if there's other issues that you think should be on the 0.3 list
feel free to add them.  (Correctness issues especially.)  But IMO we
should not let scope creep too much for our first Apache release.

-Jonathan

On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis  wrote:
> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
> good discussion to have. :)
>
> Personally my priority list looks like this:
>
> High priority:
>  1. range queries [which requires the partitioner changes we've been 
> discussing]
>  2. make cassandra not allow itself to run out of memory during
> sustained inserts
>  3. fix distributed remove issues
>  4. Support unicode keys
>
> Medium priority:
>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>  6. load balancing
>
> (1) is substantially done but will probably need some tweaking during
> code review.  And then the client api will probably need some fleshing
> out (right now you just get a list of keys back, so that's not very
> efficient if you want to get columns for each of those too.)
>
> (2) has workarounds like binarymemtable but I'd really like to get the
> main insert path able to handle large insert volume without falling
> over.  My co-worker is just starting to look into this.  I'm hoping
> there will be some straightforward improvements to make here.
>
> I outlined an approach to (3) that I think will work here:
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e
>
> I'm waiting for Avinash's feedback but as outlined it is not much code.
>
> (4) is a thrift issue, not Cassandra per se.  (see
> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
> plate so I thought I'd throw that out there.
>
> I have not started (5) or (6).  There are some stubs for load
> balancing in the code which is why I said in another thread that the
> Facebook developers have probably thought more about this.
>
> I know Avinash is currently finishing up multiget support.  Hopefully
> he will chime in about what his and Prashant's plans are next.
>
> -Jonathan
>


Roadmap

2009-04-02 Thread Jonathan Ellis
Someone asked on IRC if there is a roadmap for Cassandra.  This is a
good discussion to have. :)

Personally my priority list looks like this:

High priority:
 1. range queries [which requires the partitioner changes we've been discussing]
 2. make cassandra not allow itself to run out of memory during
sustained inserts
 3. fix distributed remove issues
 4. Support unicode keys

Medium priority:
 5. pre-emptive repair (what the dynamo paper calls anti-entropy)
 6. load balancing

(1) is substantially done but will probably need some tweaking during
code review.  And then the client api will probably need some fleshing
out (right now you just get a list of keys back, so that's not very
efficient if you want to get columns for each of those too.)

(2) has workarounds like binarymemtable but I'd really like to get the
main insert path able to handle large insert volume without falling
over.  My co-worker is just starting to look into this.  I'm hoping
there will be some straightforward improvements to make here.

I outlined an approach to (3) that I think will work here:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3ce06563880903301519h922840ds72ef6f9a8d95e...@mail.gmail.com%3e

I'm waiting for Avinash's feedback but as outlined it is not much code.

(4) is a thrift issue, not Cassandra per se.  (see
https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
plate so I thought I'd throw that out there.

I have not started (5) or (6).  There are some stubs for load
balancing in the code which is why I said in another thread that the
Facebook developers have probably thought more about this.

I know Avinash is currently finishing up multiget support.  Hopefully
he will chime in about what his and Prashant's plans are next.

-Jonathan