Re: Secondary Index seen as empty

2012-12-03 Thread Cyril Scetbon
Hi Jonathan,

Never mind it was an issue with the rmi hostname. Now it works and the issue 
comes from the following lines (KeysSearcher.java in package 
org.apache.cassandra.db.index.keys) :

if (logger.isDebugEnabled())
logger.debug(String.format(Scanning index %s 
starting with %s,
   
expressionString(primary), 
index.getBaseCfs().metadata.getKeyValidator().getString(startKey)));

QueryFilter indexFilter = 
QueryFilter.getSliceFilter(indexKey,
 
new QueryPath(index.getIndexCfs().getColumnFamilyName()),
 
lastSeenKey,
 
endKey,
 
false,
 
rowsPerQuery);
ColumnFamily indexRow = 
index.getIndexCfs().getColumnFamily(indexFilter);   it returns null
logger.debug(fetched {}, indexRow);
if (indexRow == null)
{
logger.debug(no data, all done);
return endOfData();
}

The matter is that indexFilter returns a new instance of QueryFilter with value 

QueryFilter(key=DecoratedKey(2012-11-29 02:35:00+, 013b4a046420), 
path=QueryPath(columnFamilyName='syndic.mailIndex', superColumnName='null', 
columnName='null'), filter=SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=0 
lim=0 cap=0], finish=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0], 
reversed=false, count=1])

but then index.getIndexCfs().getColumnFamily(indexFilter) returns null !

As said before, if we rebuild the index it works and returns values

Tell me if you need more information

Cyril SCETBON

On Nov 30, 2012, at 10:32 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Re remote debugging, see cassandra-env.sh:
 
 # uncomment to have Cassandra JVM listen for remote
 debuggers/profilers on port 1414
 # JVM_OPTS=$JVM_OPTS -Xdebug -Xnoagent
 -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1414
 
 On Sat, Dec 1, 2012 at 1:22 AM, Cyril Scetbon cyril.scet...@free.fr wrote:
 Hi,
 
 We got an issue here with cassandra 1.1.6 where a secondary index seems to 
 be seen as empty. For example, If I try to see what are the first 10 values 
 I have :
 
 cqlsh:pns_fr select mailendwnd from syndic limit 10;
 mailendwnd
 --
 2012-11-29 23:30:00+
 2012-11-29 02:35:00+
 2012-11-29 06:35:00+
 2012-11-29 00:30:00+
 2012-11-29 07:10:00+
 2012-11-29 02:10:00+
 2012-11-29 01:55:00+
 2012-11-29 23:45:00+
 2012-11-29 07:25:00+
 2012-11-29 06:55:00+
 
 However, even if there is a secondary index on mailendwnd, we can't get any 
 record :
 
 cqlsh:pns_fr select mailendwnd from syndic where mailendwnd = '2012-11-29 
 02:35:00+' limit 1000;
 
 fyi, It's resolved by repairing the index and it happened a few times. I 
 launched cassandra in debug mode and noticed that it thinks there is no data 
 in the index :
 
 DEBUG [Thrift:14] 2012-11-30 08:35:20,756 CassandraServer.java (line 1232) 
 execute_cql_query
 DEBUG [Thrift:14] 2012-11-30 08:35:20,758 QueryProcessor.java (line 445) CQL 
 statement type: SELECT
 DEBUG [Thrift:14] 2012-11-30 08:35:20,777 StorageProxy.java (line 842) 
 Command/ConsistencyLevel is RangeSliceCommand{keyspace='pns_fr', 
 column_family='syndic', super_column=null, 
 predicate=SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0 lim=10 
 cap=10]]), range=[min(-1),m
 in(-1)], row_filter =[IndexExpression(column_name:6D 61 69 6C 65 6E 64 77 6E 
 64, op:EQ, value:00 00 01 3B 4A 04 64 20)], maxResults=1, 
 maxIsColumns=false}/ONE
 DEBUG [Thrift:14] 2012-11-30 08:35:20,778 StorageProxy.java (line 1073) 
 restricted ranges for query [min(-1),min(-1)] are [[min(-1),max(0)], 
 (max(0),max(21267647932558653966460912964485513216)], 
 (max(21267647932558653966460912964485513216),max(42535295865117307932921825928971026432
 )], 
 (max(42535295865117307932921825928971026432),max(63802943797675961899382738893456539648)],
  
 (max(63802943797675961899382738893456539648),max(85070591730234615865843651857942052864)],
  
 (max(85070591730234615865843651857942052864),max(106338239662793269832304564822427566080)],
  (max
 (106338239662793269832304564822427566080),max(127605887595351923798765477786913079296)],
  
 (max(127605887595351923798765477786913079296),max(148873535527910577765226390751398592512)],
  (max(148873535527910577765226390751398592512),min(-1)]]
 DEBUG [Thrift:14] 2012-11-30 08:35:20,779 NetworkTopologyStrategy.java (line 
 125) /10.244.136.105,/10.244.137.238,/10.244.130.226 endpoints in datacenter 

Re: Secondary Index seen as empty

2012-12-03 Thread Cyril Scetbon
I dig a little more deeply and the null value comes from the function 
removeDeletedCF in file ColumnFamilyStore.java where cf.getColumnCount() = 0 
and cf.isMarkedForDelete()=false

Hope it helps

Regards
Cyril SCETBON

On Dec 3, 2012, at 10:02 AM, Cyril Scetbon cyril.scet...@free.fr wrote:

 Hi Jonathan,
 
 Never mind it was an issue with the rmi hostname. Now it works and the issue 
 comes from the following lines (KeysSearcher.java in package 
 org.apache.cassandra.db.index.keys) :
 
if (logger.isDebugEnabled())
logger.debug(String.format(Scanning index %s 
 starting with %s,
   
 expressionString(primary), 
 index.getBaseCfs().metadata.getKeyValidator().getString(startKey)));
 
QueryFilter indexFilter = 
 QueryFilter.getSliceFilter(indexKey,
 
 new QueryPath(index.getIndexCfs().getColumnFamilyName()),
 
 lastSeenKey,
 
 endKey,
 
 false,
 
 rowsPerQuery);
ColumnFamily indexRow = 
 index.getIndexCfs().getColumnFamily(indexFilter);   it returns null
logger.debug(fetched {}, indexRow);
if (indexRow == null)
{
logger.debug(no data, all done);
return endOfData();
}
 
 The matter is that indexFilter returns a new instance of QueryFilter with 
 value 
 
 QueryFilter(key=DecoratedKey(2012-11-29 02:35:00+, 013b4a046420), 
 path=QueryPath(columnFamilyName='syndic.mailIndex', superColumnName='null', 
 columnName='null'), 
 filter=SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0], 
 finish=java.nio.HeapByteBuffer[pos=0 lim=0 cap=0], reversed=false, 
 count=1])
 
 but then index.getIndexCfs().getColumnFamily(indexFilter) returns null !
 
 As said before, if we rebuild the index it works and returns values
 
 Tell me if you need more information
 
 Cyril SCETBON
 
 On Nov 30, 2012, at 10:32 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
 Re remote debugging, see cassandra-env.sh:
 
 # uncomment to have Cassandra JVM listen for remote
 debuggers/profilers on port 1414
 # JVM_OPTS=$JVM_OPTS -Xdebug -Xnoagent
 -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1414
 
 On Sat, Dec 1, 2012 at 1:22 AM, Cyril Scetbon cyril.scet...@free.fr wrote:
 Hi,
 
 We got an issue here with cassandra 1.1.6 where a secondary index seems to 
 be seen as empty. For example, If I try to see what are the first 10 values 
 I have :
 
 cqlsh:pns_fr select mailendwnd from syndic limit 10;
 mailendwnd
 --
 2012-11-29 23:30:00+
 2012-11-29 02:35:00+
 2012-11-29 06:35:00+
 2012-11-29 00:30:00+
 2012-11-29 07:10:00+
 2012-11-29 02:10:00+
 2012-11-29 01:55:00+
 2012-11-29 23:45:00+
 2012-11-29 07:25:00+
 2012-11-29 06:55:00+
 
 However, even if there is a secondary index on mailendwnd, we can't get any 
 record :
 
 cqlsh:pns_fr select mailendwnd from syndic where mailendwnd = '2012-11-29 
 02:35:00+' limit 1000;
 
 fyi, It's resolved by repairing the index and it happened a few times. I 
 launched cassandra in debug mode and noticed that it thinks there is no 
 data in the index :
 
 DEBUG [Thrift:14] 2012-11-30 08:35:20,756 CassandraServer.java (line 1232) 
 execute_cql_query
 DEBUG [Thrift:14] 2012-11-30 08:35:20,758 QueryProcessor.java (line 445) 
 CQL statement type: SELECT
 DEBUG [Thrift:14] 2012-11-30 08:35:20,777 StorageProxy.java (line 842) 
 Command/ConsistencyLevel is RangeSliceCommand{keyspace='pns_fr', 
 column_family='syndic', super_column=null, 
 predicate=SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0 lim=10 
 cap=10]]), range=[min(-1),m
 in(-1)], row_filter =[IndexExpression(column_name:6D 61 69 6C 65 6E 64 77 
 6E 64, op:EQ, value:00 00 01 3B 4A 04 64 20)], maxResults=1, 
 maxIsColumns=false}/ONE
 DEBUG [Thrift:14] 2012-11-30 08:35:20,778 StorageProxy.java (line 1073) 
 restricted ranges for query [min(-1),min(-1)] are [[min(-1),max(0)], 
 (max(0),max(21267647932558653966460912964485513216)], 
 (max(21267647932558653966460912964485513216),max(42535295865117307932921825928971026432
 )], 
 (max(42535295865117307932921825928971026432),max(63802943797675961899382738893456539648)],
  
 (max(63802943797675961899382738893456539648),max(85070591730234615865843651857942052864)],
  
 (max(85070591730234615865843651857942052864),max(106338239662793269832304564822427566080)],
  (max
 

Re: 2.0

2012-12-03 Thread Jason Brown
- world

Hi Jonathan,

This topic may have been discussed elsewhere, or my memory is worse off
than I thought, but what is our long term vision for thrift support?
Admittedly, I need to learn much more about the binary CQL protocol, and I
understand Ed's concerns, as well (more acutely now) about existing
installations, but we probably wouldn't have dreamt up a new client
interface/protocol if we went planning, at some point, on retiring the old
one. And, also, I missed the Avro debate from the past, so I'm not sure how
much that affects current and future thinking.

After raising the issue here on the dev list, it certainly seems like 2.0
is premature for a full-on switch over, and Ed raised some interesting
metrics to consider when we could declare the CQL protocol as 'accepted'.
I'm curious as to how you are seeing it roll out.

Thanks for your time,

-Jason





On Fri, Nov 30, 2012 at 2:49 PM, Jonathan Ellis jbel...@gmail.com wrote:

 As attractive as it would be to clean house, I think we owe it to our
 users to keep Thrift around for the forseeable future rather than
 orphan all Thrift-using applications (which is virtually everyone) on
 1.2.

 On Sat, Dec 1, 2012 at 7:33 AM, Jason Brown jasedbr...@gmail.com wrote:
  Hi Jonathan,
 
  I'm in favor of paying off the technical debt, as well, and I wonder if
  there is value in removing support for thrift with 2.0? We're currently
 in
  'do as little as possible' mode with thrift, so should we aggressively
 cast
  it off and push the binary CQL protocol? Seems like a jump to '2.0',
 along
  with the other initiatives, would be a reasonable time/milestone to do
 so.
 
  Thanks,
 
  -Jason
 
 
  On Fri, Nov 30, 2012 at 12:12 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  The more I think about it, the more I think we should call 1.2-next,
  2.0.  I'd like to spend some time paying off our technical debt:
 
  - replace supercolumns with composites (CASSANDRA-3237)
  - rewrite counters (CASSANDRA-4775)
  - improve storage engine support for wide rows
  - better stage management to improve latency (disruptor? lightweight
  threads?  custom executor + queue?)
  - improved repair (CASSANDRA-3362, 2699)
 
  Of course, we're planning some new features as well:
  - triggers (CASSANDRA-1311)
  - improved query fault tolerance (CASSANDRA-4705)
  - row size limits (CASSANDRA-3929)
  - cql3 integration for hadoop (CASSANDRA-4421)
  - improved caching (CASSANDRA-1956, 2864)
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: 2.0

2012-12-03 Thread Jason Brown
Oops, meant to address this specifically to Jonathan, but since I've
confused 'reply' with 'forward'. my apologies for any extra noise on this
topic.


On Mon, Dec 3, 2012 at 9:25 AM, Jason Brown jasedbr...@gmail.com wrote:

 - world

 Hi Jonathan,

 This topic may have been discussed elsewhere, or my memory is worse off
 than I thought, but what is our long term vision for thrift support?
 Admittedly, I need to learn much more about the binary CQL protocol, and I
 understand Ed's concerns, as well (more acutely now) about existing
 installations, but we probably wouldn't have dreamt up a new client
 interface/protocol if we went planning, at some point, on retiring the old
 one. And, also, I missed the Avro debate from the past, so I'm not sure how
 much that affects current and future thinking.

 After raising the issue here on the dev list, it certainly seems like 2.0
 is premature for a full-on switch over, and Ed raised some interesting
 metrics to consider when we could declare the CQL protocol as 'accepted'.
 I'm curious as to how you are seeing it roll out.

 Thanks for your time,

 -Jason





 On Fri, Nov 30, 2012 at 2:49 PM, Jonathan Ellis jbel...@gmail.com wrote:

 As attractive as it would be to clean house, I think we owe it to our
 users to keep Thrift around for the forseeable future rather than
 orphan all Thrift-using applications (which is virtually everyone) on
 1.2.

 On Sat, Dec 1, 2012 at 7:33 AM, Jason Brown jasedbr...@gmail.com wrote:
  Hi Jonathan,
 
  I'm in favor of paying off the technical debt, as well, and I wonder if
  there is value in removing support for thrift with 2.0? We're currently
 in
  'do as little as possible' mode with thrift, so should we aggressively
 cast
  it off and push the binary CQL protocol? Seems like a jump to '2.0',
 along
  with the other initiatives, would be a reasonable time/milestone to do
 so.
 
  Thanks,
 
  -Jason
 
 
  On Fri, Nov 30, 2012 at 12:12 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  The more I think about it, the more I think we should call 1.2-next,
  2.0.  I'd like to spend some time paying off our technical debt:
 
  - replace supercolumns with composites (CASSANDRA-3237)
  - rewrite counters (CASSANDRA-4775)
  - improve storage engine support for wide rows
  - better stage management to improve latency (disruptor? lightweight
  threads?  custom executor + queue?)
  - improved repair (CASSANDRA-3362, 2699)
 
  Of course, we're planning some new features as well:
  - triggers (CASSANDRA-1311)
  - improved query fault tolerance (CASSANDRA-4705)
  - row size limits (CASSANDRA-3929)
  - cql3 integration for hadoop (CASSANDRA-4421)
  - improved caching (CASSANDRA-1956, 2864)
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





[VOTE CLOSED] Release Apache Cassandra 1.2.0-rc1

2012-12-03 Thread Sylvain Lebresne
Alright, seems we can use a beta 3 before calling this a RC1.
So I'm closing this vote and I'll rebrand this as beta3 and do a short 24h
with that. And hopefully we'll have a true RC1 quickly after that.

Stay tuned.

--
Sylvain


On Mon, Dec 3, 2012 at 5:57 AM, Brandon Williams dri...@gmail.com wrote:

 On Sun, Dec 2, 2012 at 10:45 PM, Jonathan Ellis jbel...@gmail.com wrote:
  I'm not a fan of blocking a new rc because of bugs that are not
  regressions new in that release.  I'd also like to get more testing on
  the 1.2 fixes since b2.  But we can call it b3 instead of rc1 if you
  want.

 I agree with everything you've said.  I'm fine with calling it b3,
 though I expect we'll have that ticket closed soon and could re-roll
 an rc1 on Tuesday.

 -Brandon



[VOTE] Release Apache Cassandra 1.2.0-beta3

2012-12-03 Thread Sylvain Lebresne
So it seems we have a few things to fix before calling it a proper release
candidate, but we still have had quite a bit of changes since beta2 so I
propose the following artifacts for release as 1.2.0-beta3.

sha1: b86f75dcd7041815bb66eb3d1bb2c143f8ba5d58
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.0-beta3-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-108/org/apache/cassandra/apache-cassandra/1.2.0-beta3/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-108/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne/

Since the artifacts are basically the same ones that for the previous rc1
vote,
the vote will be a short one and will be open for 24 hours (but longer if
needed).

[1]: http://goo.gl/VUWhd (CHANGES.txt)
[2]: http://goo.gl/IGHq3 (NEWS.txt)


Re: [VOTE] Release Apache Cassandra 1.2.0-beta3

2012-12-03 Thread Brandon Williams
+1

On Mon, Dec 3, 2012 at 12:43 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 So it seems we have a few things to fix before calling it a proper release
 candidate, but we still have had quite a bit of changes since beta2 so I
 propose the following artifacts for release as 1.2.0-beta3.

 sha1: b86f75dcd7041815bb66eb3d1bb2c143f8ba5d58
 Git:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.0-beta3-tentative
 Artifacts:
 https://repository.apache.org/content/repositories/orgapachecassandra-108/org/apache/cassandra/apache-cassandra/1.2.0-beta3/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-108/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 Since the artifacts are basically the same ones that for the previous rc1
 vote,
 the vote will be a short one and will be open for 24 hours (but longer if
 needed).

 [1]: http://goo.gl/VUWhd (CHANGES.txt)
 [2]: http://goo.gl/IGHq3 (NEWS.txt)


Anti-Entropy Question

2012-12-03 Thread William Katsak
Hello,

I have a question that may seem strange. Assume that I have some *known* set of 
rows in a (potentially very) large data set that I know are out of consistency 
across the replicas. I can obviously bring these back into consistency by 
issuing standard reads (with read repair probability set to 1.0) and letting 
the system take care of it. Of course I could also implement new code as part 
of the system that takes this set of rows and programmatically issues internal 
row resolution requests (I have done this).

What I am thinking about now is the possibility of doing this in bulkā€¦is it 
conceivably possible to use the anti-entropy mechanism on a targeted set of 
data? The idea would be to use the efficiency of the repair mechanism and 
associated bulk transfer without requiring a check of the entire data set.

I've been spending a lot of time in the code, but just wanted to ask if anyone 
knows the feasibility before I spend a lot of time delving into the 
anti-entropy stuff.

Thanks,
Bill Katsak



Re: Secondary Index seen as empty

2012-12-03 Thread Cyril Scetbon
issue https://issues.apache.org/jira/browse/CASSANDRA-5024 created

Regards
Cyril SCETBON

On Dec 3, 2012, at 10:24 AM, Cyril Scetbon cyril.scet...@free.fr wrote:

  the null value comes from the function removeDeletedCF in file 
 ColumnFamilyStore.java where cf.getColumnCount() = 0 and 
 cf.isMarkedForDelete()=false



Re: [VOTE] Release Apache Cassandra 1.2.0-beta3

2012-12-03 Thread Dave Brosius

+1

On 12/03/2012 01:43 PM, Sylvain Lebresne wrote:

So it seems we have a few things to fix before calling it a proper release
candidate, but we still have had quite a bit of changes since beta2 so I
propose the following artifacts for release as 1.2.0-beta3.

sha1: b86f75dcd7041815bb66eb3d1bb2c143f8ba5d58
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.2.0-beta3-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-108/org/apache/cassandra/apache-cassandra/1.2.0-beta3/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-108/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne/

Since the artifacts are basically the same ones that for the previous rc1
vote,
the vote will be a short one and will be open for 24 hours (but longer if
needed).

[1]: http://goo.gl/VUWhd (CHANGES.txt)
[2]: http://goo.gl/IGHq3 (NEWS.txt)