[VOTE] Release Apache Cassandra 2.1.0-rc2

2014-06-23 Thread Sylvain Lebresne
We've almost there. I propose the following artifacts for release as
2.1.0-rc2.

sha1: e2bef02e254a9c6e37a86cab957a1fcba56214fd
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.0-rc2-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1014/org/apache/cassandra/apache-cassandra/2.1.0-rc2/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1014/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne/

The vote will be open for 72 hours (longer if needed).

[1]: http://goo.gl/VeKNTk (CHANGES.txt)
[2]: http://goo.gl/HHFAJU (NEWS.txt)


static columns and TTL - wouldn't it be nice if static columns played nice with a partition whose partition keys have all (TTL) expired

2014-06-23 Thread graham sanderson
So, I was thinking about a new use case, where an ideal situation would be to 
have something like

CREATE TABLE series {
id uuid,
inserted timeuuid,
small_thing blob,
large_static_thing blob static,
PRIMARY KEY (id, inserted)
}

So this is my first use of static columns, but I also want to use TTL (I just 
built 2.0.8 to play with)

https://issues.apache.org/jira/browse/CASSANDRA-6782 and friends are pretty 
confusing when it comes to TTL and the row marker, but from my playing, it 
seems at least you can control behavior because you can (re) INSERT the primary 
key values only using or not using a TTL. (Side node docs still say UPDATE and 
INSERT are identical which is strictly no longer true)

So what I really want is the ability to do

INSERT INTO series (id, large_static_thing)

then repeated 

INSERT INTO series (id, inserted, small_thing) VALUE (a, b, c) USING TTL X;

and have the partition (and the static column) disappear when the last “row” 
for the partition key is gone.

I can get this behavior if I update the large_static_thing every time along 
with inserting small_thiing, but that is exactly what I don’t want to do 
because it is large and static.

It sort of seems semantically right that a special column that is shared by 
all the rows of the same partition”  should at least have an option to have it 
expire when all “rows” expire.

It seems like this would be technically feasible (though very much non trivial) 
if you had a syntax, say “large_static_thing blob static autoexpiring”, to make 
the static column an ExpiringColumn, and have any row updates with TTL insert a 
new OnDiskAtom type (that contains a TTL but no value) for the static column. 
These could be reconciled/reduced/compacted or whatever with the ExpiringColumn 
during read and compaction.

It all sounds a bit over-complicated… so:

1) Does this sounds like a useful feature, or is it a me only use case
2) Can someone think of a way to model this reasonably efficiently today 
without using TTL on the static column (and thus having to rewrite it every 
time) - not that I’m trying to be abusive and I haven’t thought this thru, but 
my spider sense makes me think that maybe I can abuse an index on a small 
expiring column to quickly find empty partition keys
3) Is it actually simpler to implement than I think in the code base (This is 
the first time I’ve peeked at these areas of the code)
4) If implemented as I suggested above, does that have to be done in a major 
version?

Thanks,

Graham




smime.p7s
Description: S/MIME cryptographic signature


Re: static columns and TTL - wouldn't it be nice if static columns played nice with a partition whose partition keys have all (TTL) expired

2014-06-23 Thread graham sanderson
Note, that as I think about it, if you had the new OnDiskAtom time with TTL and 
no value, then you wouldn’t need anything special about static columns, you’d 
just need a CQL syntax to update/set the TTL for a column which might be useful 
for lots of things.

On Jun 24, 2014, at 12:22 AM, graham sanderson gra...@vast.com wrote:

 So, I was thinking about a new use case, where an ideal situation would be to 
 have something like
 
 CREATE TABLE series {
   id uuid,
   inserted timeuuid,
   small_thing blob,
   large_static_thing blob static,
   PRIMARY KEY (id, inserted)
 }
 
 So this is my first use of static columns, but I also want to use TTL (I just 
 built 2.0.8 to play with)
 
 https://issues.apache.org/jira/browse/CASSANDRA-6782 and friends are pretty 
 confusing when it comes to TTL and the row marker, but from my playing, it 
 seems at least you can control behavior because you can (re) INSERT the 
 primary key values only using or not using a TTL. (Side node docs still say 
 UPDATE and INSERT are identical which is strictly no longer true)
 
 So what I really want is the ability to do
 
 INSERT INTO series (id, large_static_thing)
 
 then repeated 
 
 INSERT INTO series (id, inserted, small_thing) VALUE (a, b, c) USING TTL X;
 
 and have the partition (and the static column) disappear when the last “row” 
 for the partition key is gone.
 
 I can get this behavior if I update the large_static_thing every time along 
 with inserting small_thiing, but that is exactly what I don’t want to do 
 because it is large and static.
 
 It sort of seems semantically right that a special column that is shared by 
 all the rows of the same partition”  should at least have an option to have 
 it expire when all “rows” expire.
 
 It seems like this would be technically feasible (though very much non 
 trivial) if you had a syntax, say “large_static_thing blob static 
 autoexpiring”, to make the static column an ExpiringColumn, and have any row 
 updates with TTL insert a new OnDiskAtom type (that contains a TTL but no 
 value) for the static column. These could be reconciled/reduced/compacted or 
 whatever with the ExpiringColumn during read and compaction.
 
 It all sounds a bit over-complicated… so:
 
 1) Does this sounds like a useful feature, or is it a me only use case
 2) Can someone think of a way to model this reasonably efficiently today 
 without using TTL on the static column (and thus having to rewrite it every 
 time) - not that I’m trying to be abusive and I haven’t thought this thru, 
 but my spider sense makes me think that maybe I can abuse an index on a small 
 expiring column to quickly find empty partition keys
 3) Is it actually simpler to implement than I think in the code base (This is 
 the first time I’ve peeked at these areas of the code)
 4) If implemented as I suggested above, does that have to be done in a major 
 version?
 
 Thanks,
 
 Graham
 
 



smime.p7s
Description: S/MIME cryptographic signature