[jira] [Commented] (CASSANDRA-6799) schema_version of newly bootstrapped nodes disagrees with existing nodes

2014-03-05 Thread Duncan Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920631#comment-13920631
 ] 

Duncan Sands commented on CASSANDRA-6799:
-

I restarted one of the new nodes last night (neither had been restarted since 
it was bootstrapped), and now all nodes have the same schema version: not just 
the restarted node, but also the other newly bootstrapped node.

 schema_version of newly bootstrapped nodes disagrees with existing nodes
 

 Key: CASSANDRA-6799
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6799
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: x86_64 ubuntu, java version 1.7.0_45, Cassandra 2.0.5
Reporter: Duncan Sands
 Attachments: system.log.gz


 After bootstrapping new nodes 172.18.33.23 and 172.18.33.24 last weekend, I 
 noticed that they have a different schema_version to the existing nodes.  The 
 existing nodes have all been around for a while, saw some schema changes in 
 the past (eg: timeuuid - timestamp on a column family) but none recently, 
 and were originally running 1.2 (they were upgraded to 2.0.5).  Here you see 
 the different schema version 0d9173d5-3947-328e-a14d-ce05239f61e0 for the two 
 nodes:
 cqlsh select peer, data_center, host_id, preferred_ip, rack, 
 release_version, rpc_address, schema_version from system.peers;
  peer   | data_center | host_id  | 
 preferred_ip | rack | release_version | rpc_address| schema_version
 +-+--+--+--+-++--
   192.168.21.12 | rdm | 55e4b4b6-2e64-4542-87a4-d8a8e28b5135 |
  null | RAC1 |   2.0.5 |  192.168.21.12 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.24 | ldn | 6e634206-94b6-4dcf-9cf8-72bfe190feee |
  null | RAC1 |   2.0.5 |   172.18.33.24 | 
 0d9173d5-3947-328e-a14d-ce05239f61e0
172.18.33.22 | ldn | 75c9c81f-b00b-4335-8483-fb7f1bc0be1e |
  null | RAC1 |   2.0.5 |   172.18.33.22 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.136 | adm | c83d403f-ef0d-4c54-a844-d69730fa54d3 |
  null | RAC1 |   2.0.5 | 192.168.60.136 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.137 | adm | b12e6d71-e189-4fe8-b00a-8ff2cc9848fd |
  null | RAC1 |   2.0.5 | 192.168.60.137 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
   192.168.21.11 | rdm | dd2e69cb-232f-4236-89f2-b5479669d9f7 |
  null | RAC1 |   2.0.5 |  192.168.21.11 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.21 | ldn | 6942404c-e512-46b4-977a-243defa48d0f |
  null | RAC1 |   2.0.5 |   172.18.33.21 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.138 | adm | a229bc0f-201b-479e-8312-66891f37ca85 |
  null | RAC1 |   2.0.5 | 192.168.60.138 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.134 | adm | 7b860a54-59ea-4a92-9b47-44b52793cc70 |
  null | RAC1 |   2.0.5 | 192.168.60.134 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.23 | ldn | a08bad62-55bb-492b-be64-7cf5d5073d6d |
  null | RAC1 |   2.0.5 |   172.18.33.23 | 
 0d9173d5-3947-328e-a14d-ce05239f61e0
  192.168.60.130 | adm | 3498b4b8-1047-4b42-b13b-bf27b3aa3177 |
  null | RAC1 |   2.0.5 | 192.168.60.130 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.133 | adm | 21d3faad-5c5d-447e-bab4-ad9323bdf4c1 |
  null | RAC1 |   2.0.5 | 192.168.60.133 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.135 | adm | 860ff4bb-4fcf-43ba-b270-f1844bdd3e65 |
  null | RAC1 |   2.0.5 | 192.168.60.135 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.131 | adm | d8b7b0b2-d697-43ae-ad6e-982b24637865 |
  null | RAC1 |   2.0.5 | 192.168.60.131 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
 (14 rows)
 I've attached the Cassandra log showing the 172.18.33.23 node bootstrapping.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6799) schema_version of newly bootstrapped nodes disagrees with existing nodes

2014-03-05 Thread Duncan Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920639#comment-13920639
 ] 

Duncan Sands commented on CASSANDRA-6799:
-

To be more precise, all nodes are now at schema version 
f673ced0-8cfd-3d69-baba-4f81dc60c5b5

 schema_version of newly bootstrapped nodes disagrees with existing nodes
 

 Key: CASSANDRA-6799
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6799
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: x86_64 ubuntu, java version 1.7.0_45, Cassandra 2.0.5
Reporter: Duncan Sands
 Attachments: system.log.gz


 After bootstrapping new nodes 172.18.33.23 and 172.18.33.24 last weekend, I 
 noticed that they have a different schema_version to the existing nodes.  The 
 existing nodes have all been around for a while, saw some schema changes in 
 the past (eg: timeuuid - timestamp on a column family) but none recently, 
 and were originally running 1.2 (they were upgraded to 2.0.5).  Here you see 
 the different schema version 0d9173d5-3947-328e-a14d-ce05239f61e0 for the two 
 nodes:
 cqlsh select peer, data_center, host_id, preferred_ip, rack, 
 release_version, rpc_address, schema_version from system.peers;
  peer   | data_center | host_id  | 
 preferred_ip | rack | release_version | rpc_address| schema_version
 +-+--+--+--+-++--
   192.168.21.12 | rdm | 55e4b4b6-2e64-4542-87a4-d8a8e28b5135 |
  null | RAC1 |   2.0.5 |  192.168.21.12 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.24 | ldn | 6e634206-94b6-4dcf-9cf8-72bfe190feee |
  null | RAC1 |   2.0.5 |   172.18.33.24 | 
 0d9173d5-3947-328e-a14d-ce05239f61e0
172.18.33.22 | ldn | 75c9c81f-b00b-4335-8483-fb7f1bc0be1e |
  null | RAC1 |   2.0.5 |   172.18.33.22 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.136 | adm | c83d403f-ef0d-4c54-a844-d69730fa54d3 |
  null | RAC1 |   2.0.5 | 192.168.60.136 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.137 | adm | b12e6d71-e189-4fe8-b00a-8ff2cc9848fd |
  null | RAC1 |   2.0.5 | 192.168.60.137 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
   192.168.21.11 | rdm | dd2e69cb-232f-4236-89f2-b5479669d9f7 |
  null | RAC1 |   2.0.5 |  192.168.21.11 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.21 | ldn | 6942404c-e512-46b4-977a-243defa48d0f |
  null | RAC1 |   2.0.5 |   172.18.33.21 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.138 | adm | a229bc0f-201b-479e-8312-66891f37ca85 |
  null | RAC1 |   2.0.5 | 192.168.60.138 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.134 | adm | 7b860a54-59ea-4a92-9b47-44b52793cc70 |
  null | RAC1 |   2.0.5 | 192.168.60.134 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
172.18.33.23 | ldn | a08bad62-55bb-492b-be64-7cf5d5073d6d |
  null | RAC1 |   2.0.5 |   172.18.33.23 | 
 0d9173d5-3947-328e-a14d-ce05239f61e0
  192.168.60.130 | adm | 3498b4b8-1047-4b42-b13b-bf27b3aa3177 |
  null | RAC1 |   2.0.5 | 192.168.60.130 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.133 | adm | 21d3faad-5c5d-447e-bab4-ad9323bdf4c1 |
  null | RAC1 |   2.0.5 | 192.168.60.133 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.135 | adm | 860ff4bb-4fcf-43ba-b270-f1844bdd3e65 |
  null | RAC1 |   2.0.5 | 192.168.60.135 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
  192.168.60.131 | adm | d8b7b0b2-d697-43ae-ad6e-982b24637865 |
  null | RAC1 |   2.0.5 | 192.168.60.131 | 
 f673ced0-8cfd-3d69-baba-4f81dc60c5b5
 (14 rows)
 I've attached the Cassandra log showing the 172.18.33.23 node bootstrapping.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-2356) make the debian package never start by default

2014-03-05 Thread Duncan Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920650#comment-13920650
 ] 

Duncan Sands commented on CASSANDRA-2356:
-

Many Debian packages use the /etc/default/cassandra scheme suggested by Brandon 
Williams.  Simple, standard - sounds good to me!  I don't understand why it was 
rejected.  For new installs it should clearly contain ENABLED=false; for people 
upgrading, the upgrade script would have to create this file if it didn't exist 
already with ENABLED=true, to preserve the previous behaviour.

Another point that came up on IRC is that shutting down a C* instance using the 
init scripts doesn't first drain the node.  As a result you get to replay all 
the commit logs when you start it up again - this can take a long time.  So 
draining the node before shutdown (including restart) can be a big win.

 make the debian package never start by default
 --

 Key: CASSANDRA-2356
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2356
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
Reporter: Jeremy Hanna
Priority: Minor
  Labels: debian, packaging
 Attachments: 2356.txt


 Currently the debian package that installs cassandra starts cassandra by 
 default.  It sounds like that is a standard debian packaging convention.  
 However, if you want to bootstrap a new node and want to configure it before 
 it creates any sort of state information, it's a pain.  I would think that 
 the common use case would be to have it install all of the init scripts and 
 such but *not* have it start up by default.  That way an admin can configure 
 cassandra with seed, token, host, etc. information and then start it.  That 
 makes it easier to programmatically do this as well - have chef/puppet 
 install cassandra, do some configuration, then do the service start.
 With the current setup, it sounds like cassandra creates state on startup 
 that has to be cleaned before a new configuration can take effect.  So the 
 process of installing turns into:
 * install debian package
 * shutdown cassandra
 * clean out state (data/log dirs)
 * configure cassandra
 * start cassandra
 That seems suboptimal for the default case, especially when trying to 
 automate new nodes being bootstrapped.
 Another case might be when a downed node comes back up and starts by default 
 and tries to claim a token that has already been claimed by another newly 
 bootstrapped node.  Rob is more familiar with that case so I'll let him 
 explain it in the comments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920720#comment-13920720
 ] 

Benedict commented on CASSANDRA-6689:
-

bq.  sort of RCU (i'm looking at you OpOrder)

What do you mean here? If you mean read-copy-update, OpOrder is nothing like 
this.

bq. I'm not sure what is to retain here if we do that copy when we send to the 
wire

Ultimately, doing this copying before sending to the wire is something I would 
like to avoid. Using the RefAction.allocateOnHeap() on top of this copying sees 
wire transfer speeds for thrift drop by about 10% in my fairly rough-and-ready 
benchmarks, so obviously copying has a cost. Possibly this cost is due to 
unavoidably copying data you don't necessarily want to serialise, but it seems 
to be there. Ultimately if we want to get in-memory read operations to 10x 
their current performance, we can't go cutting any corners.

bq. introducing separate gc

I've stated clearly what this introduces as a benefit: overwrite workloads no 
longer cause excessive flushes

bq.  things but as we have a fixed number of threads it is going to work out 
the same way as for buffering open files in the steady system state

Your next sentence states how this is a large cause of memory consumption, so 
surely we should be using that memory if possible for other uses (returning it 
to the buffer cache, or using it internally for more caching)?

bq. Temporary memory allocated by readers is exactly what we should be managing 
at the first place because they allocate the most and it always the biggest 
concern for us

I agree we should be moving to managing this as well, however I disagree about 
how we should be managing it. In the medium term we should be bringing the 
buffer cache in process, so that we can answer some queries without handing off 
to the mutation stage (anything known to be non-blocking and fast should be 
answered immediately by the thread that processed the connection), at which 
point we will benefit from shared use of the memory pool, and concrete control 
over how much memory readers are using, and zero-copy reads from the buffer 
cache. I hope we may be able to do this for 3.0.

bq. do a simple memcpy test and see how much mb/s can you get from copying from 
one pre-allocated pool to another

Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, *it probably makes the 
situation worse*.

bq. it's not the memtable which creates the most of the noise and memory 
presure in the system (even tho it uses big chunk of heap) 

It may not be causing the young gen pressure you're seeing, but it certainly 
offers some benefit here by keeping more rows in memory so recent queries are 
more likely to be answered with zero allocation, so reducing young gen 
pressure; it is also a foundation for improving the row cache and introducing a 
shared page cache which could bring us closer to zero allocation reads.

It's also not clear to me how you would be managing the reclaim of the off-heap 
allocations without OpOrder, or do you mean to only use off-heap buffers for 
readers, or to ref-count any memory as you're reading it? Not using off-heap 
memory for the memtables would negate the main original point of this ticket: 
to support larger memtables, thus reducing write amplification. Ref-counting 
incurs overhead linear to the size of the result set, much like copying, and is 
also fiddly to get right (not convinced it's cleaner or neater), whereas 
OpOrder incurs overhead proportional to the number of times you reclaim. So if 
you're using OpOrder, all you're really talking about is a new RefAction: 
copyToAllocator() or something. So it doesn't notably reduce complexity, it 
just reduces the quality of the end result.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920721#comment-13920721
 ] 

Benedict commented on CASSANDRA-6689:
-

bq.  but the reads and internode communication (especially the latter).

Also, I'd love to see some evidence for this (particularly the latter). I'm not 
disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920786#comment-13920786
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6311:
---

org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:275
{noformat}
OptionalSSLOptions ssLOptions = getSSLOptions(conf);
{noformat}
typo: ssL - ssl
--
org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:398:
{noformat}
OptionalInteger maxSimultaneousRequests = 
getInputMinSimultReqPerConnections(conf);
OptionalInteger minSimultaneousRequests = 
getInputMaxSimultReqPerConnections(conf);
{noformat}
min and max swapped?   
--
org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:549:
{noformat}
OptionalString keystorePassword = getInputNativeSSLTruststorePassword(conf);
{noformat}
should be:
{noformat}
OptionalString keystorePassword = getInputNativeSSLKeystorePassword(conf);
{noformat}
--
org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:524:
{noformat}
  return new AbstractIteratorHost()
{
protected Host computeNext()
{
return origHost;
}  
};
{noformat}
Not sure if it was the intent to create an infinite iterator returning nulls or 
the same host over and over again here. According to the docs, guava iterator 
implementations *must* invoke endOfData() to terminate iteration. Don't we need 
here an iterator returning just one item stickHost and let the driver handle 
the rest?

Also, not sure if returning nulls here is allowed at all (the driver docs isn't 
explicit on that).
I guess very likely it is going to NPE if there is a connection problem which 
might cause confusion. Probably a better solution would be to just return 
stickHost and let the driver attempt connecting and throwing a meaningful error 
message upon failure.

BTW the implementation of the LoadBalancingPolicy, having two fields origHost 
and stickHost is redundant and using null on one of those for marking the host 
is down / unreachable does not convey the intent clearly to me. Can't we just 
use stickHost and a direct boolean flag for denoting whether it is reachable or 
not?

--
org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:591:
{noformat}
private static OptionalString getStringSetting(String parameter, 
Configuration conf)
{
String setting = conf.get(parameter);
if (setting == null || setting.isEmpty())
return Optional.absent();
return Optional.of(setting);  
}
{noformat}
In getStringSetting, setting an empty string is considered an absent option - 
so it is not possible to have an empty string setting (not sure if it would be 
useful - just double checking if it was on purpose or by omission)
--
{noformat}
 *  2) where clause must include token(partition_key1 ... partition_keyn)  
? and 
 * token(partition_key1 ... partition_keyn) = ?
{noformat}

Would be nice to have at least some basic validation of the WHERE clause, so 
the user gets a nice error message when one screws it up. 
--
org/apache/cassandra/hadoop/cql3/CqlRecordReader.java:230
{noformat}
   public RowIterator(Configuration conf)
{noformat}
conf not used
--
org/apache/cassandra/hadoop/cql3/CqlRecordReader.java:268
{noformat}
return Pair.create(Long.valueOf(keyId), row);
{noformat}
Boxing is not needed here.




 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Kołaczkowski updated CASSANDRA-6311:
--

Reviewer: Piotr Kołaczkowski  (was: Jonathan Ellis)

 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Andreas Schnitzerling (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920811#comment-13920811
 ] 

Andreas Schnitzerling commented on CASSANDRA-6283:
--

Hello,
since I don't know all code areas of C*, I describe, what I tested to 
reproduce: I cleaned system.log and used again C* 2.0.5-rel with LEAK detection 
and finalizer-patch in RAR.java. After starting again C* w/o doing anything I 
got a lot LEAK messages. I waited until C* finished his own work (mainly 
compacting I think). Now I started repair -par. Result are a lot of LEAK 
messages. Here the first one:

{panel:title=nodetool repair -par events}
ERROR [Finalizer] 2014-03-05 13:45:25,932 RandomAccessReader.java (line 394) 
LEAK finalizer had to clean up 
java.lang.Exception: RAR for 
D:\Programme\cassandra\data\events\eventsbyproject\events-eventsbyproject-jb-2002-Index.db
 allocated
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:63)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:103)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90)
at 
org.apache.cassandra.io.util.BufferedPoolingSegmentedFile.createReader(BufferedPoolingSegmentedFile.java:45)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:162)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:143)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:936)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:783)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1186)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1174)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:252)
at 
org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:888)
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:787)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62)
at 
org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:397)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{panel}
If I can make more tests, let me know. After Thursday I will be on holiday for 
3 weeks and in office again at Mon, 03/31/2014.

 Windows 7 data files keept open / can't be deleted after compaction.
 

 Key: CASSANDRA-6283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 (32) / Java 1.7.0.45
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
  Labels: compaction
 Fix For: 2.0.6

 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, 
 screenshot-1.jpg, system.log


 Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't 
 help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause 
 is: Opened file handles seem to be lost and not closed properly. Win 7 
 blames, that another process is still using the file (but its obviously 
 cassandra). Only restart of the server makes the files deleted. But after 
 heavy using (changes) of tables, there are about 24K files in the data folder 
 (instead of 35 after every restart) and Cassandra crashes. I experiminted and 
 I found out, that a finalizer fixes the problem. So after GC the files will 
 be deleted (not optimal, but working fine). It runs now 2 days continously 
 without problem. Possible fix/test:
 I wrote the following finalizer at the end of class 
 org.apache.cassandra.io.util.RandomAccessReader:
 {code:title=RandomAccessReader.java|borderStyle=solid}
 @Override
 protected void finalize() throws Throwable {
   deallocate();
   super.finalize();
 }
 {code}
 Can somebody test / develop / patch it? Thx.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Andreas Schnitzerling (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920811#comment-13920811
 ] 

Andreas Schnitzerling edited comment on CASSANDRA-6283 at 3/5/14 12:59 PM:
---

Hello,
since I don't know all code areas of C*, I describe, what I tested to 
reproduce: I cleaned system.log and used again C* 2.0.5-rel with LEAK detection 
and finalizer-patch in RAR.java. After starting again C* w/o doing anything I 
got a lot LEAK messages. I waited until C* finished his own work (mainly 
compacting I think). Now I started repair -par. Result are a lot of LEAK 
messages. Here the first one:

{panel:title=nodetool repair -par events}
ERROR [Finalizer] 2014-03-05 13:45:25,932 RandomAccessReader.java (line 394) 
LEAK finalizer had to clean up 
java.lang.Exception: RAR for 
D:\Programme\cassandra\data\events\eventsbyproject\events-eventsbyproject-jb-2002-Index.db
 allocated
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:63)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:103)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90)
at 
org.apache.cassandra.io.util.BufferedPoolingSegmentedFile.createReader(BufferedPoolingSegmentedFile.java:45)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:162)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:143)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:936)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:783)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1186)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1174)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:252)
at 
org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:888)
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:787)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62)
at 
org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:397)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{panel}
{panel:title=neighbor node}
ERROR [Finalizer] 2014-03-05 13:50:54,061 RandomAccessReader.java (line 394) 
LEAK finalizer had to clean up 
java.lang.Exception: RAR for 
D:\Programme\cassandra\data\events\evrangesdevice\events-evrangesdevice-jb-905-Index.db
 allocated
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:63)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:103)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90)
at 
org.apache.cassandra.io.util.BufferedPoolingSegmentedFile.createReader(BufferedPoolingSegmentedFile.java:45)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:162)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:143)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:936)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:788)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1186)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1174)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:252)
at 
org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:888)
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:787)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62)
at 

[jira] [Comment Edited] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Andreas Schnitzerling (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920811#comment-13920811
 ] 

Andreas Schnitzerling edited comment on CASSANDRA-6283 at 3/5/14 1:00 PM:
--

Hello,
since I don't know all code areas of C*, I describe, what I tested to 
reproduce: I cleaned system.log and used again C* 2.0.5-rel with LEAK detection 
and finalizer-patch in RAR.java. After starting again C* w/o doing anything I 
got a lot LEAK messages. I waited until C* finished his own work (mainly 
compacting I think). Now I started repair -par. Result are a lot of LEAK 
messages. Here the first one:

{panel:title=nodetool repair -par events}
ERROR [Finalizer] 2014-03-05 13:45:25,932 RandomAccessReader.java (line 394) 
LEAK finalizer had to clean up 
java.lang.Exception: RAR for 
D:\Programme\cassandra\data\events\eventsbyproject\events-eventsbyproject-jb-2002-Index.db
 allocated
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:63)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:103)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90)
at 
org.apache.cassandra.io.util.BufferedPoolingSegmentedFile.createReader(BufferedPoolingSegmentedFile.java:45)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:162)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:143)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:936)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:783)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1186)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1174)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:252)
at 
org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:888)
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:787)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62)
at 
org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:397)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{panel}
{panel:title=neighbor node}
ERROR [Finalizer] 2014-03-05 13:50:54,061 RandomAccessReader.java (line 394) 
LEAK finalizer had to clean up 
java.lang.Exception: RAR for 
D:\Programme\cassandra\data\events\evrangesdevice\events-evrangesdevice-jb-905-Index.db
 allocated
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:63)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:103)
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90)
at 
org.apache.cassandra.io.util.BufferedPoolingSegmentedFile.createReader(BufferedPoolingSegmentedFile.java:45)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:162)
at 
org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:143)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:936)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
at 
org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:788)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1186)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1174)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:252)
at 
org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:888)
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:787)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62)
at 

[3/3] git commit: add Thrift get_multi_slice call patch by Ed Capriolo; reviewed by Tyler Hobbs for CASSANDRA-6757

2014-03-05 Thread jbellis
add Thrift get_multi_slice call
patch by Ed Capriolo; reviewed by Tyler Hobbs for CASSANDRA-6757


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/60fb9230
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/60fb9230
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/60fb9230

Branch: refs/heads/trunk
Commit: 60fb923018a6fd2dabf04a1d4500f7b29a23a6f1
Parents: 630d3b9
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Mar 5 07:57:25 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Mar 5 08:02:59 2014 -0600

--
 CHANGES.txt |1 +
 interface/cassandra.thrift  |   37 +-
 .../org/apache/cassandra/thrift/Cassandra.java  | 3071 +-
 .../apache/cassandra/thrift/ColumnSlice.java|  551 
 .../cassandra/thrift/MultiSliceRequest.java | 1042 ++
 .../cassandra/thrift/cassandraConstants.java|2 +-
 .../cassandra/thrift/CassandraServer.java   |   68 +
 test/system/test_thrift_server.py   |   34 +
 .../cassandra/db/ColumnFamilyStoreTest.java |1 +
 .../apache/cassandra/thrift/MultiSliceTest.java |  149 +
 10 files changed, 4050 insertions(+), 906 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/60fb9230/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e324225..1c0941b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,6 @@
 3.0
  * Remove CQL2 (CASSANDRA-5918)
+ * add Thrift get_multi_slice call (CASSANDRA-6757)
 
 
 2.1.0-beta2

http://git-wip-us.apache.org/repos/asf/cassandra/blob/60fb9230/interface/cassandra.thrift
--
diff --git a/interface/cassandra.thrift b/interface/cassandra.thrift
index e46b85e..b6b06dc 100644
--- a/interface/cassandra.thrift
+++ b/interface/cassandra.thrift
@@ -55,7 +55,7 @@ namespace rb CassandraThrift
 # An effort should be made not to break forward-client-compatibility either
 # (e.g. one should avoid removing obsolete fields from the IDL), but no
 # guarantees in this respect are made by the Cassandra project.
-const string VERSION = 20.0.0
+const string VERSION = 20.1.0
 
 
 #
@@ -563,6 +563,35 @@ struct CfSplit {
 3: required i64 row_count
 }
 
+/** The ColumnSlice is used to select a set of columns from inside a row. 
+ * If start or finish are unspecified they will default to the start-of
+ * end-of value.
+ * @param start. The start of the ColumnSlice inclusive
+ * @param finish. The end of the ColumnSlice inclusive
+ */
+struct ColumnSlice {
+1: optional binary start,
+2: optional binary finish
+}
+
+/**
+ * Used to perform multiple slices on a single row key in one rpc operation
+ * @param key. The row key to be multi sliced
+ * @param column_parent. The column family (super columns are unsupported)
+ * @param column_slices. 0 to many ColumnSlice objects each will be used to 
select columns
+ * @param reversed. Direction of slice
+ * @param count. Maximum number of columns
+ * @param consistency_level. Level to perform the operation at
+ */
+struct MultiSliceRequest {
+1: optional binary key,
+2: optional ColumnParent column_parent,
+3: optional listColumnSlice column_slices,
+4: optional bool reversed=false,
+5: optional i32 count=1000,
+6: optional ConsistencyLevel consistency_level=ConsistencyLevel.ONE
+}
+
 service Cassandra {
   # auth methods
   void login(1: required AuthenticationRequest auth_request) throws 
(1:AuthenticationException authnx, 2:AuthorizationException authzx),
@@ -741,7 +770,11 @@ service Cassandra {
   void truncate(1:required string cfname)
throws (1: InvalidRequestException ire, 2: UnavailableException ue, 3: 
TimedOutException te),
 
-
+  /**
+  * Select multiple slices of a key in a single RPC operation
+  */
+  listColumnOrSuperColumn get_multi_slice(1:required MultiSliceRequest 
request)
+   throws (1:InvalidRequestException ire, 2:UnavailableException ue, 
3:TimedOutException te),
 
   // Meta-APIs -- APIs to get information about the node or cluster,
   // rather than user data.  The nodeprobe program provides usage examples.



[jira] [Commented] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920931#comment-13920931
 ] 

Joshua McKenzie commented on CASSANDRA-6283:


Could you attach the system.log from the root and neighbor nodes to this 
ticket?  Might help see if there's anything else going on there in the 
environment involved in this.

 Windows 7 data files keept open / can't be deleted after compaction.
 

 Key: CASSANDRA-6283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 (32) / Java 1.7.0.45
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
  Labels: compaction
 Fix For: 2.0.6

 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, 
 screenshot-1.jpg, system.log


 Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't 
 help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause 
 is: Opened file handles seem to be lost and not closed properly. Win 7 
 blames, that another process is still using the file (but its obviously 
 cassandra). Only restart of the server makes the files deleted. But after 
 heavy using (changes) of tables, there are about 24K files in the data folder 
 (instead of 35 after every restart) and Cassandra crashes. I experiminted and 
 I found out, that a finalizer fixes the problem. So after GC the files will 
 be deleted (not optimal, but working fine). It runs now 2 days continously 
 without problem. Possible fix/test:
 I wrote the following finalizer at the end of class 
 org.apache.cassandra.io.util.RandomAccessReader:
 {code:title=RandomAccessReader.java|borderStyle=solid}
 @Override
 protected void finalize() throws Throwable {
   deallocate();
   super.finalize();
 }
 {code}
 Can somebody test / develop / patch it? Thx.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920931#comment-13920931
 ] 

Joshua McKenzie edited comment on CASSANDRA-6283 at 3/5/14 3:01 PM:


Could you attach the most recent system.log from the root and neighbor nodes to 
this ticket?  Might help see if there's anything else going on there in the 
environment involved in this.


was (Author: joshuamckenzie):
Could you attach the system.log from the root and neighbor nodes to this 
ticket?  Might help see if there's anything else going on there in the 
environment involved in this.

 Windows 7 data files keept open / can't be deleted after compaction.
 

 Key: CASSANDRA-6283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 (32) / Java 1.7.0.45
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
  Labels: compaction
 Fix For: 2.0.6

 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, 
 screenshot-1.jpg, system.log


 Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't 
 help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause 
 is: Opened file handles seem to be lost and not closed properly. Win 7 
 blames, that another process is still using the file (but its obviously 
 cassandra). Only restart of the server makes the files deleted. But after 
 heavy using (changes) of tables, there are about 24K files in the data folder 
 (instead of 35 after every restart) and Cassandra crashes. I experiminted and 
 I found out, that a finalizer fixes the problem. So after GC the files will 
 be deleted (not optimal, but working fine). It runs now 2 days continously 
 without problem. Possible fix/test:
 I wrote the following finalizer at the end of class 
 org.apache.cassandra.io.util.RandomAccessReader:
 {code:title=RandomAccessReader.java|borderStyle=solid}
 @Override
 protected void finalize() throws Throwable {
   deallocate();
   super.finalize();
 }
 {code}
 Can somebody test / develop / patch it? Thx.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6800) ant codecoverage no longer works due jdk 1.7

2014-03-05 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6800:
--

   Reviewer: Jonathan Ellis
Component/s: Tests
   Priority: Minor  (was: Major)

 ant codecoverage no longer works due jdk 1.7
 

 Key: CASSANDRA-6800
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6800
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Code coverage does not run currently due to cobertura jdk incompatibility. 
 Fix is coming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6800) ant codecoverage no longer works due jdk 1.7

2014-03-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920955#comment-13920955
 ] 

Jonathan Ellis commented on CASSANDRA-6800:
---

I'm getting a lot of errors even after realclean.  The first is:

{noformat}
cobertura-instrument:
[cobertura-instrument] Cobertura null - GNU GPL License (NO WARRANTY) - See 
COPYRIGHT file
[cobertura-instrument] WARN   instrumentClass, Unable to instrument file 
/Users/jbellis/projects/cassandra/git/build/classes/main/org/apache/cassandra/cli/CliClient.class
[cobertura-instrument] java.lang.RuntimeException: 
java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CounterSuperColumn
[cobertura-instrument]  at 
org.objectweb.asm.ClassWriter.getCommonSuperClass(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.ClassWriter.a(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.Frame.a(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.Frame.a(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.MethodWriter.visitMaxs(Unknown 
Source)
[cobertura-instrument]  at org.objectweb.asm.MethodVisitor.visitMaxs(Unknown 
Source)
[cobertura-instrument]  at 
org.objectweb.asm.util.CheckMethodAdapter.visitMaxs(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.MethodVisitor.visitMaxs(Unknown 
Source)
[cobertura-instrument]  at 
org.objectweb.asm.commons.LocalVariablesSorter.visitMaxs(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.tree.MethodNode.accept(Unknown 
Source)
[cobertura-instrument]  at 
org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.MethodVisitor.visitEnd(Unknown 
Source)
[cobertura-instrument]  at 
org.objectweb.asm.util.CheckMethodAdapter.visitEnd(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.ClassReader.b(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.ClassReader.accept(Unknown Source)
[cobertura-instrument]  at org.objectweb.asm.ClassReader.accept(Unknown Source)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:204)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.CoberturaInstrumenter.instrumentClass(CoberturaInstrumenter.java:121)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.CoberturaInstrumenter.addInstrumentationToSingleClass(CoberturaInstrumenter.java:233)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.Main.addInstrumentationToSingleClass(Main.java:274)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.Main.addInstrumentation(Main.java:283)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.Main.parseArguments(Main.java:373)
[cobertura-instrument]  at 
net.sourceforge.cobertura.instrument.Main.main(Main.java:395)
{noformat}

 ant codecoverage no longer works due jdk 1.7
 

 Key: CASSANDRA-6800
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6800
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Code coverage does not run currently due to cobertura jdk incompatibility. 
 Fix is coming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6800) ant codecoverage no longer works due jdk 1.7

2014-03-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920960#comment-13920960
 ] 

Edward Capriolo commented on CASSANDRA-6800:


I noticed that. This is pretty weird. From maven I have used the cobertura 
plugin, worked great. What a PITA ant is/ Maybe we should switch to maven :)

I made it all the way though the process and it build the cobertura.ser but ran 
into some problem with the report target. I will keep looking at it for a bit. 

 ant codecoverage no longer works due jdk 1.7
 

 Key: CASSANDRA-6800
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6800
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Code coverage does not run currently due to cobertura jdk incompatibility. 
 Fix is coming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Break timestamp ties for thrift-ers

2014-03-05 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920986#comment-13920986
 ] 

Nate McCall commented on CASSANDRA-6147:


That would actually be very helpful and would not break anything in the wild 
(Astyanax, Hector and (im pretty sure) pycassa all assert a not-null timestamp 
on egress anyhoo), so it would be unusual for someone to be relying on this as 
validation currently. 

 Break timestamp ties for thrift-ers
 ---

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6591) un-deprecate cache recentHitRate and expose in o.a.c.metrics

2014-03-05 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920988#comment-13920988
 ] 

Chris Burroughs commented on CASSANDRA-6591:


Sorry I'm not following.  If we are getting requests but no hits (mostly 
misses), the hit rate going down is what I would expect.

 un-deprecate cache recentHitRate and expose in o.a.c.metrics
 

 Key: CASSANDRA-6591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6591
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Burroughs
Assignee: Chris Burroughs
Priority: Minor
 Attachments: j6591-1.2-v1.txt, j6591-1.2-v2.txt, j6591-1.2-v3.txt


 recentHitRate metrics were not added as part of CASSANDRA-4009 because there 
 is not an obvious way to do it with the Metrics library.  Instead hitRate was 
 added as an all time measurement since node restart.
 This does allow changes in cache rate (aka production performance problems)  
 to be detected.  Ideally there would be 1/5/15 moving averages for the hit 
 rate, but I'm not sure how to calculate that.  Instead I propose updating 
 recentHitRate on a fixed interval and exposing that as a Gauge.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Break timestamp ties for thrift-ers

2014-03-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921007#comment-13921007
 ] 

Edward Capriolo commented on CASSANDRA-6147:


I do not think this would break anything. Anything currently out there must be 
setting the timestamp explicitly. Anything not setting the timestamp is just 
getting 0. Users quickly find out what happens when two inserts have the same 0 
timestamp


 Break timestamp ties for thrift-ers
 ---

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Break timestamp ties for thrift-ers

2014-03-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921018#comment-13921018
 ] 

Jonathan Ellis commented on CASSANDRA-6147:
---

I'm a little confused, are we changing scope on this ticket from break 
timestamp ties to allow opting in to server-side timestamps?

nanotime is basically random so that would break ties but not very usefully :)

 Break timestamp ties for thrift-ers
 ---

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Break timestamp ties for thrift-ers

2014-03-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921033#comment-13921033
 ] 

Edward Capriolo commented on CASSANDRA-6147:


[~jbellis] You are right I kinda stole this ticket. The point of this patch is 
that if CQL can auto-timestamp things, thrift should be able to as well. Would 
you like me to open another ticket? Should the auto-timestamp be 
system.currentTimeMillis() + 1000? How does CQL arrive at its auto timestamp?

 Break timestamp ties for thrift-ers
 ---

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[6/6] git commit: Merge branch 'cassandra-2.1' into trunk

2014-03-05 Thread brandonwilliams
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f601cac0
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f601cac0
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f601cac0

Branch: refs/heads/trunk
Commit: f601cac021be203b0c4caa8375a3c9eb3ee94b70
Parents: 60fb923 7f7a9cc
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:23:59 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:23:59 2014 -0600

--

--




[2/6] git commit: Add hadoop progressable compatibility. Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201

2014-03-05 Thread brandonwilliams
Add hadoop progressable compatibility.
Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/24923083
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/24923083
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/24923083

Branch: refs/heads/cassandra-2.1
Commit: 249230834c2ce1ac169b2b3228d5d222f5ecacc2
Parents: ab2717b
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:21:35 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:21:35 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   3 +-
 18 files changed, 346 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/build.xml
--
diff --git a/build.xml b/build.xml
index 77b2639..9972aa2 100644
--- a/build.xml
+++ b/build.xml
@@ -367,7 +367,6 @@
   /dependency
   dependency groupId=org.apache.hadoop artifactId=hadoop-core 
version=1.0.3/
   dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster version=1.0.3/
-  dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat version=4.3/
   dependency groupId=org.apache.pig artifactId=pig 
version=0.10.0/
   dependency groupId=net.java.dev.jna artifactId=jna 
version=3.2.7/
 
@@ -410,7 +409,6 @@
 dependency groupId=org.apache.rat artifactId=apache-rat/
 dependency groupId=org.apache.hadoop artifactId=hadoop-core/
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat/
 dependency groupId=org.apache.pig artifactId=pig/
 
 dependency groupId=net.java.dev.jna artifactId=jna/
@@ -474,7 +472,6 @@
 !-- don't need hadoop classes to run, but if you use the hadoop stuff 
--
 dependency groupId=org.apache.hadoop artifactId=hadoop-core 
optional=true/
 dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster optional=true/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat optional=true/
 dependency groupId=org.apache.pig artifactId=pig optional=true/
 
 !-- don't need jna to run, but nice to have --

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
index f547fd0..ba79eee 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
@@ -29,7 +29,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
-import com.twitter.elephantbird.util.HadoopCompat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
index a3c4234..3041829 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
+++ 

[4/6] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

2014-03-05 Thread brandonwilliams
Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7f7a9cc7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7f7a9cc7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7f7a9cc7

Branch: refs/heads/trunk
Commit: 7f7a9cc754944cd7da19996c9e20377ecf2cfe7d
Parents: 0851fd7 2492308
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:23:47 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:23:47 2014 -0600

--

--




[5/6] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

2014-03-05 Thread brandonwilliams
Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7f7a9cc7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7f7a9cc7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7f7a9cc7

Branch: refs/heads/cassandra-2.1
Commit: 7f7a9cc754944cd7da19996c9e20377ecf2cfe7d
Parents: 0851fd7 2492308
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:23:47 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:23:47 2014 -0600

--

--




[1/6] git commit: Add hadoop progressable compatibility. Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201

2014-03-05 Thread brandonwilliams
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 ab2717b6f - 249230834
  refs/heads/cassandra-2.1 0851fd74b - 7f7a9cc75
  refs/heads/trunk 60fb92301 - f601cac02


Add hadoop progressable compatibility.
Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/24923083
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/24923083
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/24923083

Branch: refs/heads/cassandra-2.0
Commit: 249230834c2ce1ac169b2b3228d5d222f5ecacc2
Parents: ab2717b
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:21:35 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:21:35 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   3 +-
 18 files changed, 346 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/build.xml
--
diff --git a/build.xml b/build.xml
index 77b2639..9972aa2 100644
--- a/build.xml
+++ b/build.xml
@@ -367,7 +367,6 @@
   /dependency
   dependency groupId=org.apache.hadoop artifactId=hadoop-core 
version=1.0.3/
   dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster version=1.0.3/
-  dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat version=4.3/
   dependency groupId=org.apache.pig artifactId=pig 
version=0.10.0/
   dependency groupId=net.java.dev.jna artifactId=jna 
version=3.2.7/
 
@@ -410,7 +409,6 @@
 dependency groupId=org.apache.rat artifactId=apache-rat/
 dependency groupId=org.apache.hadoop artifactId=hadoop-core/
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat/
 dependency groupId=org.apache.pig artifactId=pig/
 
 dependency groupId=net.java.dev.jna artifactId=jna/
@@ -474,7 +472,6 @@
 !-- don't need hadoop classes to run, but if you use the hadoop stuff 
--
 dependency groupId=org.apache.hadoop artifactId=hadoop-core 
optional=true/
 dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster optional=true/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat optional=true/
 dependency groupId=org.apache.pig artifactId=pig optional=true/
 
 !-- don't need jna to run, but nice to have --

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
index f547fd0..ba79eee 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
@@ -29,7 +29,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
-import com.twitter.elephantbird.util.HadoopCompat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java 

[3/6] git commit: Add hadoop progressable compatibility. Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201

2014-03-05 Thread brandonwilliams
Add hadoop progressable compatibility.
Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/24923083
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/24923083
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/24923083

Branch: refs/heads/trunk
Commit: 249230834c2ce1ac169b2b3228d5d222f5ecacc2
Parents: ab2717b
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 11:21:35 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 11:21:35 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   3 +-
 18 files changed, 346 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/build.xml
--
diff --git a/build.xml b/build.xml
index 77b2639..9972aa2 100644
--- a/build.xml
+++ b/build.xml
@@ -367,7 +367,6 @@
   /dependency
   dependency groupId=org.apache.hadoop artifactId=hadoop-core 
version=1.0.3/
   dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster version=1.0.3/
-  dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat version=4.3/
   dependency groupId=org.apache.pig artifactId=pig 
version=0.10.0/
   dependency groupId=net.java.dev.jna artifactId=jna 
version=3.2.7/
 
@@ -410,7 +409,6 @@
 dependency groupId=org.apache.rat artifactId=apache-rat/
 dependency groupId=org.apache.hadoop artifactId=hadoop-core/
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat/
 dependency groupId=org.apache.pig artifactId=pig/
 
 dependency groupId=net.java.dev.jna artifactId=jna/
@@ -474,7 +472,6 @@
 !-- don't need hadoop classes to run, but if you use the hadoop stuff 
--
 dependency groupId=org.apache.hadoop artifactId=hadoop-core 
optional=true/
 dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster optional=true/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat optional=true/
 dependency groupId=org.apache.pig artifactId=pig optional=true/
 
 !-- don't need jna to run, but nice to have --

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
index f547fd0..ba79eee 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
@@ -29,7 +29,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
-import com.twitter.elephantbird.util.HadoopCompat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/24923083/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
index a3c4234..3041829 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
+++ 

[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

2014-03-05 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921074#comment-13921074
 ] 

Sylvain Lebresne commented on CASSANDRA-6588:
-

It occured to me that it's not at all impossible to optimize this all out at 
the storage engine level (and that's probably the right solution).

Let me first sum up quickly the problem we're actually trying to solve here: 
when you query just one CQL row and only select some of it's columns (and 
*only* in that), we can't use a NamesQueryFilter underneath just because if we 
get back no result we're not able to distinguish between the row exists but 
has not data for those columns that have been selected and the row doesn't 
exist.  So instead we currently issue a SliceQueryFilter for the whole CQL 
row, which can be slower than if we were able to use a NamesQueryFilter because:
# NamesQueryFilter uses the CollationController.collectTimeOrderedData() path, 
that can potentially skip some sstables.
# NamesQueryFilter avoids sending the value for the columns of the CQL row that 
are not selected to the coordinator to have them ignored later (it doesn't 
matter so much as far as disk reading is concerned since we don't really read 
cells from disk one by one).

So anyway, we could specialize a new RowQueryFilter (which would be the new 
NamesQueryFilter for CQL3 tables). That filter would use the 
collectTimeOrderedData() path and would only return the columns queried (+ the 
row marker), but at the sstable level, it would read from the beginning of the 
CQL row and as soon as it encounter a live column, it would add the row marker 
to the result, but otherwise it would skip any column that is not part of the 
selected ones. In other words, why we can't rely on the row marker being here 
due to TTL, it's not too hard when deserializing the sstable to generate a 
fake one for the purpose of the query, but to do avoid doing any extra work 
otherwise.

As a side note, we could actually reuse that same idea for SliceQueryFilter 
(i.e. have a slice filter but that only care about a subset of the CQL row 
columns), which would improve the case for slices (when you only select a 
subset of the columns that is).


 Add a 'NO EMPTY RESULTS' filter to SELECT
 -

 Key: CASSANDRA-6588
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
 Fix For: 2.1 beta2


 It is the semantic of CQL that a (CQL) row exists as long as it has one 
 non-null column (including the PK columns, which, given that no PK columns 
 can be null, means that it's enough to have the PK set for a row to exist). 
 This does means that the result to
 {noformat}
 CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
 INSERT INTO test(k, v1) VALUES (0, 4);
 SELECT v2 FROM test;
 {noformat}
 must be (and is)
 {noformat}
  v2
 --
  null
 {noformat}
 That fact does mean however that when we only select a few columns of a row, 
 we still need to find out rows that exist but have no values for the selected 
 columns. Long story short, given how the storage engine works, this means we 
 need to query full (CQL) rows even when only some of the columns are selected 
 because that's the only way to distinguish between the row exists but have 
 no value for the selected columns and the row doesn't exist. I'll note in 
 particular that, due to CASSANDRA-5762, we can't unfortunately rely on the 
 row marker to optimize that out.
 Now, when you selects only a subsets of the columns of a row, there is many 
 cases where you don't care about rows that exists but have no value for the 
 columns you requested and are happy to filter those out. So, for those cases, 
 we could provided a new SELECT filter. Outside the potential convenience (not 
 having to filter empty results client side), one interesting part is that 
 when this filter is provided, we could optimize a bit by only querying the 
 columns selected, since we wouldn't need to return rows that exists but have 
 no values for the selected columns.
 For the exact syntax, there is probably a bunch of options. For instance:
 * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it 
 in the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
 Possibly a bit ugly outside of that.
 * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test 
 NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is 
 shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, 
 the only small object to {{NO EMPTY ROWS}} could be that it might suggest it 
 is filtering non existing rows (I mean, the fact we never ever return non 
 existing rows should hint 

[jira] [Commented] (CASSANDRA-6591) un-deprecate cache recentHitRate and expose in o.a.c.metrics

2014-03-05 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921078#comment-13921078
 ] 

Yuki Morishita commented on CASSANDRA-6591:
---

My test is here https://gist.github.com/yukim/9371796 which simulates misses 
after hits.

If I plot this to graph:

!https://docs.google.com/spreadsheet/oimg?key=0AhjS79jizSXtdDJUcnBzdU9tSG9WVG5ia1N3eUx1bncoid=5zx=xnsjrj3s0of0!

The blue line is the one proposed in this ticket and the red line is hit rate 
one minute rate, and I see quite difference there.
 

 un-deprecate cache recentHitRate and expose in o.a.c.metrics
 

 Key: CASSANDRA-6591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6591
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Burroughs
Assignee: Chris Burroughs
Priority: Minor
 Attachments: j6591-1.2-v1.txt, j6591-1.2-v2.txt, j6591-1.2-v3.txt


 recentHitRate metrics were not added as part of CASSANDRA-4009 because there 
 is not an obvious way to do it with the Metrics library.  Instead hitRate was 
 added as an all time measurement since node restart.
 This does allow changes in cache rate (aka production performance problems)  
 to be detected.  Ideally there would be 1/5/15 moving averages for the hit 
 rate, but I'm not sure how to calculate that.  Instead I propose updating 
 recentHitRate on a fixed interval and exposing that as a Gauge.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921110#comment-13921110
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. I've stated clearly what this introduces as a benefit: overwrite workloads 
no longer cause excessive flushes

If you do a copy before of the memtable buffer, you can clearly put it back to 
the allocator once it's overwritten or becomes otherwise useless, in the 
process of merging columns with previous row contents.

bq. Your next sentence states how this is a large cause of memory consumption, 
so surely we should be using that memory if possible for other uses (returning 
it to the buffer cache, or using it internally for more caching)?

It doesn't state that is a *large cause of memory consumption*, it states that 
it has additional cost but it the steady state it don't be allocating over the 
limit because of the properties of the system that we have, namely the fixed 
number of threads.

bq. Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, it probably makes the 
situation worse.

Do you mean this? Let's say we copy a Cell (or Column object), which is 1 level 
deep so just allocate additional space for the object headers and do a copy, 
most of the work would be spend by doing a copy of the data (name/value) 
anyway, so as we want to live inside of ParNew, see how many such allocations 
you will be able to do in e.g. 1 second then wipe the whole thing and do it 
again. We are doing mlockall too which should make that even faster as we are 
sure that heap is pre-faulted already.

bq. It may not be causing the young gen pressure you're seeing, but it 
certainly offers some benefit here by keeping more rows in memory so recent 
queries are more likely to be answered with zero allocation, so reducing young 
gen pressure; it is also a foundation for improving the row cache and 
introducing a shared page cache which could bring us closer to zero allocation 
reads. _And so on_

I'm not sure how this would help in the case of row cache, once reference is 
added to the row cache it means that memtable would hang in there until that 
row is purged, so if there is a long lived row (write once, read multiple 
times) in each of the regions (and we reclaim based on regions) would that keep 
memtable around longer than expected?

bq. It's also not clear to me how you would be managing the reclaim of the 
off-heap allocations without OpOrder, or do you mean to only use off-heap 
buffers for readers, or to ref-count any memory as you're reading it? Not using 
off-heap memory for the memtables would negate the main original point of this 
ticket: to support larger memtables, thus reducing write amplification. 
Ref-counting incurs overhead linear to the size of the result set, much like 
copying, and is also fiddly to get right (not convinced it's cleaner or 
neater), whereas OpOrder incurs overhead proportional to the number of times 
you reclaim. So if you're using OpOrder, all you're really talking about is a 
new RefAction: copyToAllocator() or something. So it doesn't notably reduce 
complexity, it just reduces the quality of the end result.

In terms of memory usage copy adds additional linear cost yes but at the same 
time it makes the system behavior more controllable/predictable which is what 
ops usually care about where, even on the artificial stress test, there seems 
to be a low once off-heap feature is enabled which is no surprise once you look 
at how much complexity does it actually add.

bq. Also, I'd love to see some evidence for this (particularly the latter). I'm 
not disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.

Well, it seems like you never operated a real Cassandra cluster, did you? All 
of the problems that I have listed here are well known, you can even simulate 
this with docker VMs and making internal network gradually slower, there is no 
back pressure mechanism built-in so right now Cassandra would accept a bunch or 
operations on the normal speed (if the outgoing link is physically different 
than internal) but suddenly would just stop accepting anything and fail 
internally because of GC storm caused by all of the internode buffers hanging 
around.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: 

[jira] [Created] (CASSANDRA-6802) Row cache improvements

2014-03-05 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-6802:
--

 Summary: Row cache improvements
 Key: CASSANDRA-6802
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6802
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 3.0


There are a few things we could do;

* Start using the native memory constructs from CASSANDRA-6694 to avoid 
serialization/deserialization costs and to minimize the on-heap overhead
* Stop invalidating cached rows on writes (update on write instead).




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921141#comment-13921141
 ] 

Alex Liu commented on CASSANDRA-6311:
-

1. validation of input CQL query needs parsing the query which is what we are 
trying to avoid.
2. AbstractIterator is to always return to the local host (so that the task is 
only read data from local host ), it doesn't return endOfData(). It's using 
stickHost, a host name, to get the Host object which can't be directly created 
due to the class is not public class. The Host object, origHost, is obtained 
from cluster internal code. It's possible that origHost object can be null 
which case the  stickHost is not in the cluster. In that case we don't want the 
job to run for it's  in the wrong host.
3. I clean up the code according to other notes.

Attach v6 version.


 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6331-2.0-branch.txt, 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread Alex Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-6311:


Attachment: 6311-v6-2.0-branch.txt

 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6331-2.0-branch.txt, 
 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Andreas Schnitzerling (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Schnitzerling updated CASSANDRA-6283:
-

Attachment: neighbor-log.zip
root-log.zip

Logs during nodetool repair -par events. C* 2.0.5-rel with LEAK-log and 
finalizer-patch under Win-7.

 Windows 7 data files keept open / can't be deleted after compaction.
 

 Key: CASSANDRA-6283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 (32) / Java 1.7.0.45
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
  Labels: compaction
 Fix For: 2.0.6

 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, 
 neighbor-log.zip, root-log.zip, screenshot-1.jpg, system.log


 Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't 
 help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause 
 is: Opened file handles seem to be lost and not closed properly. Win 7 
 blames, that another process is still using the file (but its obviously 
 cassandra). Only restart of the server makes the files deleted. But after 
 heavy using (changes) of tables, there are about 24K files in the data folder 
 (instead of 35 after every restart) and Cassandra crashes. I experiminted and 
 I found out, that a finalizer fixes the problem. So after GC the files will 
 be deleted (not optimal, but working fine). It runs now 2 days continously 
 without problem. Possible fix/test:
 I wrote the following finalizer at the end of class 
 org.apache.cassandra.io.util.RandomAccessReader:
 {code:title=RandomAccessReader.java|borderStyle=solid}
 @Override
 protected void finalize() throws Throwable {
   deallocate();
   super.finalize();
 }
 {code}
 Can somebody test / develop / patch it? Thx.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6283) Windows 7 data files keept open / can't be deleted after compaction.

2014-03-05 Thread Andreas Schnitzerling (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921180#comment-13921180
 ] 

Andreas Schnitzerling edited comment on CASSANDRA-6283 at 3/5/14 6:40 PM:
--

I attached logs (root-log.zip and neighbor-log.zip) during nodetool repair 
-par events. C* 2.0.5-rel with LEAK-log and finalizer-patch under Win-7.


was (Author: andie78):
Logs during nodetool repair -par events. C* 2.0.5-rel with LEAK-log and 
finalizer-patch under Win-7.

 Windows 7 data files keept open / can't be deleted after compaction.
 

 Key: CASSANDRA-6283
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6283
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 (32) / Java 1.7.0.45
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
  Labels: compaction
 Fix For: 2.0.6

 Attachments: 6283_StreamWriter_patch.txt, leakdetect.patch, 
 neighbor-log.zip, root-log.zip, screenshot-1.jpg, system.log


 Files cannot be deleted, patch CASSANDRA-5383 (Win7 deleting problem) doesn't 
 help on Win-7 on Cassandra 2.0.2. Even 2.1 Snapshot is not running. The cause 
 is: Opened file handles seem to be lost and not closed properly. Win 7 
 blames, that another process is still using the file (but its obviously 
 cassandra). Only restart of the server makes the files deleted. But after 
 heavy using (changes) of tables, there are about 24K files in the data folder 
 (instead of 35 after every restart) and Cassandra crashes. I experiminted and 
 I found out, that a finalizer fixes the problem. So after GC the files will 
 be deleted (not optimal, but working fine). It runs now 2 days continously 
 without problem. Possible fix/test:
 I wrote the following finalizer at the end of class 
 org.apache.cassandra.io.util.RandomAccessReader:
 {code:title=RandomAccessReader.java|borderStyle=solid}
 @Override
 protected void finalize() throws Throwable {
   deallocate();
   super.finalize();
 }
 {code}
 Can somebody test / develop / patch it? Thx.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921110#comment-13921110
 ] 

Pavel Yaskevich edited comment on CASSANDRA-6689 at 3/5/14 6:55 PM:


bq. I've stated clearly what this introduces as a benefit: overwrite workloads 
no longer cause excessive flushes

If you do a copy before of the memtable buffer, you can clearly put it back to 
the allocator once it's overwritten or becomes otherwise useless, in the 
process of merging columns with previous row contents.

bq. Your next sentence states how this is a large cause of memory consumption, 
so surely we should be using that memory if possible for other uses (returning 
it to the buffer cache, or using it internally for more caching)?

It doesn't state that is a *large cause of memory consumption*, it states that 
it has additional cost but it the steady state it won't be allocating over the 
limit because of the properties of the system that we have, namely the fixed 
number of threads.

bq. Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, it probably makes the 
situation worse.

Do you mean this? Let's say we copy a Cell (or Column object), which is 1 level 
deep so just allocate additional space for the object headers and do a copy, 
most of the work would be spend by doing a copy of the data (name/value) 
anyway, so as we want to live inside of ParNew (so we can just discard already 
dead objects), see how many such allocations you will be able to do in e.g. 1 
second then wipe the whole thing (equivalent of  ParNew with rejects dead + 
compacts) and do it again. We are doing mlockall too which should make that 
even faster as we are sure that heap is pre-faulted already.

bq. It may not be causing the young gen pressure you're seeing, but it 
certainly offers some benefit here by keeping more rows in memory so recent 
queries are more likely to be answered with zero allocation, so reducing young 
gen pressure; it is also a foundation for improving the row cache and 
introducing a shared page cache which could bring us closer to zero allocation 
reads. _And so on_

I'm not sure how this would help in the case of row cache, once reference is 
added to the row cache it means that memtable would hang in there until that 
row is purged, so if there is a long lived row (write once, read multiple 
times) in each of the regions (and we reclaim based on regions) would that keep 
memtable around longer than expected?

bq. It's also not clear to me how you would be managing the reclaim of the 
off-heap allocations without OpOrder, or do you mean to only use off-heap 
buffers for readers, or to ref-count any memory as you're reading it? Not using 
off-heap memory for the memtables would negate the main original point of this 
ticket: to support larger memtables, thus reducing write amplification. 
Ref-counting incurs overhead linear to the size of the result set, much like 
copying, and is also fiddly to get right (not convinced it's cleaner or 
neater), whereas OpOrder incurs overhead proportional to the number of times 
you reclaim. So if you're using OpOrder, all you're really talking about is a 
new RefAction: copyToAllocator() or something. So it doesn't notably reduce 
complexity, it just reduces the quality of the end result.

In terms of memory usage copy adds additional linear cost, yes, but at the same 
time it makes the system behavior more controllable/predictable which is what 
ops usually care about, where, even with the artificial stress test, there 
seems to be a low once off-heap feature is enabled which is no surprise once 
you look at how much complexity does it actually add.

bq. Also, I'd love to see some evidence for this (particularly the latter). I'm 
not disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.

Well, it seems like you never operated a real Cassandra cluster, did you? All 
of the problems that I have listed here are well known, you can even simulate 
this with docker VMs and making internal network gradually slower, there is 
*no* back pressure mechanism built-in so right now Cassandra would accept a 
bunch or operations on the normal speed (if the outgoing link is physically 
different than internal, which should always be the case) but suddenly would 
just stop accepting anything and fail internally because of GC storm 

[jira] [Updated] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases

2014-03-05 Thread Benjamin Coverston (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Coverston updated CASSANDRA-5201:
--

Attachment: hadoop-compat-2.1-merge.patch

Patch for 2.1 branch

 Cassandra/Hadoop does not support current Hadoop releases
 -

 Key: CASSANDRA-5201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0
Reporter: Brian Jeltema
Assignee: Benjamin Coverston
 Fix For: 2.0.6

 Attachments: 5201_a.txt, hadoop-compat-2.1-merge.patch, 
 hadoopCompat.patch, hadoopcompat-trunk.patch, progressable-fix.patch, 
 progressable-wrapper.patch


 Using Hadoop 0.22.0 with Cassandra results in the stack trace below.
 It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext
 from a class to an interface.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
   at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at MyHadoopApp.run(MyHadoopApp.java:163)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
   at MyHadoopApp.main(MyHadoopApp.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[1/3] git commit: Add hadoop progressable compatibility. Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201

2014-03-05 Thread brandonwilliams
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 7f7a9cc75 - 4cf8a8a6c
  refs/heads/trunk f601cac02 - 7c7193769


Add hadoop progressable compatibility.
Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4cf8a8a6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4cf8a8a6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4cf8a8a6

Branch: refs/heads/cassandra-2.1
Commit: 4cf8a8a6c356889609f9ffb74d548a68e52ec506
Parents: 7f7a9cc
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 12:54:42 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 12:54:42 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   1 -
 18 files changed, 345 insertions(+), 81 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/build.xml
--
diff --git a/build.xml b/build.xml
index bb8673e..304b5fe 100644
--- a/build.xml
+++ b/build.xml
@@ -374,7 +374,6 @@
exclusion groupId=org.mortbay.jetty 
artifactId=servlet-api/
   /dependency
   dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster version=1.0.3/
-  dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat version=4.3/
   dependency groupId=org.apache.pig artifactId=pig 
version=0.11.1/
   dependency groupId=net.java.dev.jna artifactId=jna 
version=4.0.0/
 
@@ -418,7 +417,6 @@
 dependency groupId=org.apache.rat artifactId=apache-rat/
 dependency groupId=org.apache.hadoop artifactId=hadoop-core/
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat/
 dependency groupId=org.apache.pig artifactId=pig/
dependency groupId=com.google.code.findbugs artifactId=jsr305/
   /artifact:pom
@@ -485,7 +483,6 @@
 !-- don't need hadoop classes to run, but if you use the hadoop stuff 
--
 dependency groupId=org.apache.hadoop artifactId=hadoop-core 
optional=true/
 dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster optional=true/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat optional=true/
 dependency groupId=org.apache.pig artifactId=pig optional=true/
 
 !-- don't need jna to run, but nice to have --

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
index 760193f..cb106e9 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
@@ -29,7 +29,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
-import com.twitter.elephantbird.util.HadoopCompat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
index 

[3/3] git commit: Merge branch 'cassandra-2.1' into trunk

2014-03-05 Thread brandonwilliams
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7c719376
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7c719376
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7c719376

Branch: refs/heads/trunk
Commit: 7c7193769c5b85b8bcb38cf3f5afb3e3be0e1016
Parents: f601cac 4cf8a8a
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 12:55:30 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 12:55:30 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   1 -
 18 files changed, 345 insertions(+), 81 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7c719376/build.xml
--



[2/3] git commit: Add hadoop progressable compatibility. Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201

2014-03-05 Thread brandonwilliams
Add hadoop progressable compatibility.
Patch by Ben Coverston, reviewed by brandonwilliams for CASSANDRA-5201


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4cf8a8a6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4cf8a8a6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4cf8a8a6

Branch: refs/heads/trunk
Commit: 4cf8a8a6c356889609f9ffb74d548a68e52ec506
Parents: 7f7a9cc
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Mar 5 12:54:42 2014 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Mar 5 12:54:42 2014 -0600

--
 build.xml   |   3 -
 .../hadoop/AbstractColumnFamilyInputFormat.java |   1 -
 .../AbstractColumnFamilyOutputFormat.java   |   1 -
 .../AbstractColumnFamilyRecordWriter.java   |   2 +
 .../cassandra/hadoop/BulkOutputFormat.java  |   3 +-
 .../cassandra/hadoop/BulkRecordWriter.java  |  16 +-
 .../hadoop/ColumnFamilyInputFormat.java |   1 -
 .../hadoop/ColumnFamilyOutputFormat.java|   2 +-
 .../hadoop/ColumnFamilyRecordReader.java|   1 -
 .../hadoop/ColumnFamilyRecordWriter.java|  15 +-
 .../apache/cassandra/hadoop/HadoopCompat.java   | 309 +++
 .../apache/cassandra/hadoop/Progressable.java   |  50 ---
 .../cassandra/hadoop/cql3/CqlOutputFormat.java  |   3 +-
 .../hadoop/cql3/CqlPagingInputFormat.java   |   2 +-
 .../hadoop/cql3/CqlPagingRecordReader.java  |   2 +-
 .../cassandra/hadoop/cql3/CqlRecordWriter.java  |  12 +-
 .../cassandra/hadoop/pig/CassandraStorage.java  |   2 +-
 .../apache/cassandra/hadoop/pig/CqlStorage.java |   1 -
 18 files changed, 345 insertions(+), 81 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/build.xml
--
diff --git a/build.xml b/build.xml
index bb8673e..304b5fe 100644
--- a/build.xml
+++ b/build.xml
@@ -374,7 +374,6 @@
exclusion groupId=org.mortbay.jetty 
artifactId=servlet-api/
   /dependency
   dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster version=1.0.3/
-  dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat version=4.3/
   dependency groupId=org.apache.pig artifactId=pig 
version=0.11.1/
   dependency groupId=net.java.dev.jna artifactId=jna 
version=4.0.0/
 
@@ -418,7 +417,6 @@
 dependency groupId=org.apache.rat artifactId=apache-rat/
 dependency groupId=org.apache.hadoop artifactId=hadoop-core/
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat/
 dependency groupId=org.apache.pig artifactId=pig/
dependency groupId=com.google.code.findbugs artifactId=jsr305/
   /artifact:pom
@@ -485,7 +483,6 @@
 !-- don't need hadoop classes to run, but if you use the hadoop stuff 
--
 dependency groupId=org.apache.hadoop artifactId=hadoop-core 
optional=true/
 dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster optional=true/
-dependency groupId=com.twitter.elephantbird 
artifactId=elephant-bird-hadoop-compat optional=true/
 dependency groupId=org.apache.pig artifactId=pig optional=true/
 
 !-- don't need jna to run, but nice to have --

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
index 760193f..cb106e9 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyInputFormat.java
@@ -29,7 +29,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
-import com.twitter.elephantbird.util.HadoopCompat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4cf8a8a6/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
--
diff --git 
a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java 
b/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
index a3c4234..3041829 100644
--- a/src/java/org/apache/cassandra/hadoop/AbstractColumnFamilyOutputFormat.java
+++ 

[jira] [Updated] (CASSANDRA-6778) FBUtilities.singleton() should use the CF comparator

2014-03-05 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-6778:
---

Attachment: 6778-test.txt

+1

The attached 6778-text.txt adds a unit test to exercise this patch.

As for 1.2, I agree that this is a pretty rare corner case, so it's probably 
safer to only apply this patch to 2.0.

By the way, it looks like you already did most of this work on 2.1 as part of 
CASSANDRA-5417, so make sure the conflicts get resolved properly.

 FBUtilities.singleton() should use the CF comparator
 

 Key: CASSANDRA-6778
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6778
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0.6

 Attachments: 0001-Proper-comparison-for-singleton-sorted-set.txt, 
 0002-Use-comparator-instead-of-BB.equals.txt, 6778-test.txt


 We sometimes use FBUtilities.singleton() to created a SortedSet for 
 NamesQueryFilter. However, the set created by that method does not use the CF 
 comparator, so that it use ByteBuffer comparison/equality for methods like 
 contains(). And this might not be ok if it turns that the comparator is so 
 that 2 column name can be equal without their binary representation being 
 equal, and as it turns out at least IntegerType, DecimalType (because they 
 let you put arbitrary many zeros in front of the binary encoding) have such 
 property (BooleanType should also have that property though it doesn't in 
 practice which I think that's a bug, but that's for another ticket).
 I'll note that CASSANDRA-6733 contains an example where this matter.  
 However, in practice, only SELECT on compact tables that select just one 
 column can ever ran into that and you'd only run into it if your client 
 insert useless zeros in its IntegerType/DecimalType binary representation, 
 which ought to be not common in the first place. It's still wrong and should 
 be fixed.
 Patch attached to include the comparator in FBUtilities.singleton. I also 
 found 2 other small places where we were using ByteBuffer.equals() where the 
 comparator should be used instead and attaching a 2nd patch for those.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread Alex Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921141#comment-13921141
 ] 

Alex Liu edited comment on CASSANDRA-6311 at 3/5/14 7:08 PM:
-

1. validation of input CQL query needs parsing the query which is what we are 
trying to avoid.
2. AbstractIterator is to always return to the local host (so that the task is 
only read data from local host ). It's using stickHost, a host name, to get the 
Host object which can't be directly created due to the class is not public 
class.  Add liveRemoteHosts, so if local host is down, then use remote node.
3. I clean up the code according to other notes.

Attach v6 version.



was (Author: alexliu68):
1. validation of input CQL query needs parsing the query which is what we are 
trying to avoid.
2. AbstractIterator is to always return to the local host (so that the task is 
only read data from local host ), it doesn't return endOfData(). It's using 
stickHost, a host name, to get the Host object which can't be directly created 
due to the class is not public class. The Host object, origHost, is obtained 
from cluster internal code. It's possible that origHost object can be null 
which case the  stickHost is not in the cluster. In that case we don't want the 
job to run for it's  in the wrong host.
3. I clean up the code according to other notes.

Attach v6 version.


 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6331-2.0-branch.txt, 
 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread Alex Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-6311:


Attachment: 6311-v6-2.0-branch.txt

 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6331-2.0-branch.txt, 
 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6311) Add CqlRecordReader to take advantage of native CQL pagination

2014-03-05 Thread Alex Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated CASSANDRA-6311:


Attachment: (was: 6311-v6-2.0-branch.txt)

 Add CqlRecordReader to take advantage of native CQL pagination
 --

 Key: CASSANDRA-6311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6311
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Alex Liu
Assignee: Alex Liu
 Fix For: 2.0.6

 Attachments: 6311-v3-2.0-branch.txt, 6311-v4.txt, 
 6311-v5-2.0-branch.txt, 6311-v6-2.0-branch.txt, 6331-2.0-branch.txt, 
 6331-v2-2.0-branch.txt


 Since the latest Cql pagination is done and it should be more efficient, so 
 we need update CqlPagingRecordReader to use it instead of the custom thrift 
 paging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6526) CQLSSTableWriter addRow(MapString, Object values) does not work as documented.

2014-03-05 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921240#comment-13921240
 ] 

Tyler Hobbs commented on CASSANDRA-6526:


+1

 CQLSSTableWriter addRow(MapString, Object values) does not work as 
 documented.
 

 Key: CASSANDRA-6526
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6526
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Yariv Amar
Assignee: Sylvain Lebresne
 Fix For: 2.0.6

 Attachments: 6526.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 There are 2 bugs in the method
 {code}
 addRow(MapString, Object values)
 {code}
 First issue is that the map bmust/b contain all the column names as keys 
 in the map otherwise the addRow fails (with InvalidRequestException Invalid 
 number of arguments, expecting %d values but got %d).
 Second Issue is that the keys in the map must be in lower-case otherwise they 
 may not be found in the map, which will result in a NPE during decompose.
 h6. SUGGESTED SOLUTION:
 Fix the addRow method with:
 {code}
 public CQLSSTableWriter addRow(MapString, Object values)
 throws InvalidRequestException, IOException
 {
 int size = boundNames.size();
 MapString, ByteBuffer rawValues = new HashMap(size);
 for (int i = 0; i  size; i++) {
 ColumnSpecification spec = boundNames.get(i);
 String colName = spec.name.toString();
 rawValues.put(colName, values.get(colName) == null ? null : 
 ((AbstractType)spec.type).decompose(values.get(colName)));
 }
 return rawAddRow(rawValues);
 }
 {code}
 When creating the new Map for the insert we need to go over all columns and 
 apply null to missing columns.
 Fix the method documentation add this line:
 {code}
  * p
  * Keys in the map bmust/b be in lower case, otherwise their value 
 will be null.
  *
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6801) INSERT with IF NOT EXISTS fails when row is an expired ttl

2014-03-05 Thread Paul Kendall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921298#comment-13921298
 ] 

Paul Kendall commented on CASSANDRA-6801:
-

This is related to CASSANDRA-6623. Although in that case only a single value 
has a TTL, in this all values have a TTL.


 INSERT with IF NOT EXISTS fails when row is an expired ttl
 --

 Key: CASSANDRA-6801
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6801
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Adam Hattrell

 I ran this on a 2 DC cluster with 3 nodes each.  
 CREATE KEYSPACE test WITH replication = {
 'class': 'NetworkTopologyStrategy',
 'DC1': '3',
 'DC2': '3'
 };
 CREATE TABLE clusterlock (
 name text,
 hostname text,
 lockid text,
 PRIMARY KEY (name)
 ) ;
 Then add some data and flush it to ensure the sstables exist (didn't 
 reproduce in memtables for some reason).
 Then
  insert into clusterlock (name, lockid, hostname) values  ( 'adam', 'tt', 
 '111') IF NOT EXISTS USING TTL 5;
 Wait for ttl to be reached then try again:
  insert into clusterlock (name, lockid, hostname) values  ( 'adam', 'tt', 
 '111') IF NOT EXISTS USING TTL 5;
  
 [applied]
 ---
  False
 select * shows no rows in table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6147) Allow Thrift opt-in to server-side timestamps

2014-03-05 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6147:
--

Component/s: API
   Reviewer: Tyler Hobbs
   Priority: Minor  (was: Major)
Summary: Allow Thrift opt-in to server-side timestamps  (was: Break 
timestamp ties for thrift-ers)

WDYT [~thobbs]?

 Allow Thrift opt-in to server-side timestamps
 -

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Allow Thrift opt-in to server-side timestamps

2014-03-05 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921328#comment-13921328
 ] 

Nate McCall commented on CASSANDRA-6147:


[~appodictic] FBUtilities#timestampMicros() looks like the common way to do 
what you want. 

 Allow Thrift opt-in to server-side timestamps
 -

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5899) Sends all interface in native protocol notification when rpc_address=0.0.0.0

2014-03-05 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-5899:
---

Attachment: 5899.txt

5899.txt (and [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-5899]) 
adds a broadcast_rpc_address option.

If not set, this defaults to rpc_address.  If rpc_address is 0.0.0.0, 
broadcast_rpc_address must be set, and you can never use 0.0.0.0 for 
broadcast_rpc_address.

 Sends all interface in native protocol notification when rpc_address=0.0.0.0
 

 Key: CASSANDRA-5899
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5899
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1 beta2

 Attachments: 5899.txt


 For the native protocol notifications, when we send a new node notification, 
 we send the rpc_address of that new node. For this to be actually useful, 
 that address sent should be publicly accessible by the driver it is destined 
 to. 
 The problem is when rpc_address=0.0.0.0. Currently, we send the 
 listen_address, which is correct in the sense that we do are bind on it but 
 might not be accessible by client nodes.
 In fact, one of the good reason to use 0.0.0.0 rpc_address would be if you 
 have a private network for internode communication and another for 
 client-server communinations, but still want to be able to issue query from 
 the private network for debugging. In that case, the current behavior to send 
 listen_address doesn't really help.
 So one suggestion would be to instead send all the addresses on which the 
 (native protocol) server is bound to (which would still leave to the driver 
 the task to pick the right one, but at least it has something to pick from).
 That's relatively trivial to do in practice, but it does require a minor 
 binary protocol break to return a list instead of just one IP, which is why 
 I'm tentatively marking this 2.0. Maybe we can shove that tiny change in the 
 final (in the protocol v2 only)? Povided we agree it's a good idea of course.
 Now to be complete, for the same reasons, we would also need to store all the 
 addresses we are bound to in the peers table. That's also fairly simple and 
 the backward compatibility story is maybe a tad simpler: we could add a new 
 {{rpc_addresses}} column that would be a list and deprecate {{rpc_address}} 
 (to be removed in 2.1 for instance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5899) Sends all interface in native protocol notification when rpc_address=0.0.0.0

2014-03-05 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-5899:
---

Reviewer: Sylvain Lebresne

 Sends all interface in native protocol notification when rpc_address=0.0.0.0
 

 Key: CASSANDRA-5899
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5899
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1 beta2

 Attachments: 5899.txt


 For the native protocol notifications, when we send a new node notification, 
 we send the rpc_address of that new node. For this to be actually useful, 
 that address sent should be publicly accessible by the driver it is destined 
 to. 
 The problem is when rpc_address=0.0.0.0. Currently, we send the 
 listen_address, which is correct in the sense that we do are bind on it but 
 might not be accessible by client nodes.
 In fact, one of the good reason to use 0.0.0.0 rpc_address would be if you 
 have a private network for internode communication and another for 
 client-server communinations, but still want to be able to issue query from 
 the private network for debugging. In that case, the current behavior to send 
 listen_address doesn't really help.
 So one suggestion would be to instead send all the addresses on which the 
 (native protocol) server is bound to (which would still leave to the driver 
 the task to pick the right one, but at least it has something to pick from).
 That's relatively trivial to do in practice, but it does require a minor 
 binary protocol break to return a list instead of just one IP, which is why 
 I'm tentatively marking this 2.0. Maybe we can shove that tiny change in the 
 final (in the protocol v2 only)? Povided we agree it's a good idea of course.
 Now to be complete, for the same reasons, we would also need to store all the 
 addresses we are bound to in the peers table. That's also fairly simple and 
 the backward compatibility story is maybe a tad simpler: we could add a new 
 {{rpc_addresses}} column that would be a list and deprecate {{rpc_address}} 
 (to be removed in 2.1 for instance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6803) nodetool getsstables fails with 'blob' type primary keys

2014-03-05 Thread Nate McCall (JIRA)
Nate McCall created CASSANDRA-6803:
--

 Summary: nodetool getsstables fails with 'blob' type primary keys
 Key: CASSANDRA-6803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6803
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Nate McCall
Assignee: Nate McCall
 Fix For: 2.0.6


Trivial fix, just need to get the bytebuffer from the CfMetaData's key 
validator as opposed to just calling String#getBytes (which breaks for keys of 
BytesType).  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6803) nodetool getsstables fails with 'blob' type primary keys

2014-03-05 Thread Nate McCall (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate McCall updated CASSANDRA-6803:
---

Attachment: sstables_for_key_blob_support.txt
sstables_for_key_blob_support_2.0.txt

patches for 2.0 and 1.2. 

 nodetool getsstables fails with 'blob' type primary keys
 

 Key: CASSANDRA-6803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6803
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Nate McCall
Assignee: Nate McCall
 Fix For: 2.0.6

 Attachments: sstables_for_key_blob_support.txt, 
 sstables_for_key_blob_support_2.0.txt


 Trivial fix, just need to get the bytebuffer from the CfMetaData's key 
 validator as opposed to just calling String#getBytes (which breaks for keys 
 of BytesType).  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6147) Allow Thrift opt-in to server-side timestamps

2014-03-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921856#comment-13921856
 ] 

Edward Capriolo commented on CASSANDRA-6147:


I was considering adding this feature to deletes as well because the same logic 
holds. Here is the problem:

struct Deletion {
1: optional i64 timestamp,
2: optional binary super_column,
3: optional SlicePredicate predicate,
}

  void remove(1:required binary key,
  2:required ColumnPath column_path,
  3:required i64 timestamp,
  4:ConsistencyLevel consistency_level=ConsistencyLevel.ONE)
   throws (1:InvalidRequestException ire, 2:UnavailableException ue, 
3:TimedOutException te),


else if (!del.isSetTimestamp())
{
throw new 
org.apache.cassandra.exceptions.InvalidRequestException(Deletion timestamp is 
not optional for non commutative column family  + metadata.cfName);
}

Because remove requires a timestamp and deletion does not. We can not remove 
that. What we can do is set it to optional, throw an exception server side and 
them maybe later (1 year) truly allow it to be optional and not throw the 
exception.

I update my branch with Nate's change. I also modified the interface file to 
document the change.

 Allow Thrift opt-in to server-side timestamps
 -

 Key: CASSANDRA-6147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6147
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 2.1 beta2


 Thrift users are still forced to generate timestamps on the client side. 
 Currently the way the thrift bindings are generated users are forced to 
 supply timestamps. There are two solutions I see.
 * -1 as timestamp means generate on the server side
 This is a breaking change, for those using -1 as a timestamp (which should 
 effectively be no one.
 * Prepare yourself
 Our thrift signatures are wrong, you can't overload methods in thrift
 thrift.get(byte [], byte[], ts) 
 should REALLY be changed to 
 GetRequest g =  new GetRequest()
 g.setName()
 g.setValue()
 g.setTs() ///optional 
 thrift. get( g )
 I know no one is going to want to make this change because thrift is 
 quasi/dead but it would allow us to evolve thrift in a meaningful way. We 
 could simple add these new methods under different names as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

2014-03-05 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921974#comment-13921974
 ] 

Tupshin Harper commented on CASSANDRA-6588:
---

I read that three times, and as long as there is no technical objection or 
problem implementing it, I'd *love* to see that as our approach.

+1

 Add a 'NO EMPTY RESULTS' filter to SELECT
 -

 Key: CASSANDRA-6588
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
 Fix For: 2.1 beta2


 It is the semantic of CQL that a (CQL) row exists as long as it has one 
 non-null column (including the PK columns, which, given that no PK columns 
 can be null, means that it's enough to have the PK set for a row to exist). 
 This does means that the result to
 {noformat}
 CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
 INSERT INTO test(k, v1) VALUES (0, 4);
 SELECT v2 FROM test;
 {noformat}
 must be (and is)
 {noformat}
  v2
 --
  null
 {noformat}
 That fact does mean however that when we only select a few columns of a row, 
 we still need to find out rows that exist but have no values for the selected 
 columns. Long story short, given how the storage engine works, this means we 
 need to query full (CQL) rows even when only some of the columns are selected 
 because that's the only way to distinguish between the row exists but have 
 no value for the selected columns and the row doesn't exist. I'll note in 
 particular that, due to CASSANDRA-5762, we can't unfortunately rely on the 
 row marker to optimize that out.
 Now, when you selects only a subsets of the columns of a row, there is many 
 cases where you don't care about rows that exists but have no value for the 
 columns you requested and are happy to filter those out. So, for those cases, 
 we could provided a new SELECT filter. Outside the potential convenience (not 
 having to filter empty results client side), one interesting part is that 
 when this filter is provided, we could optimize a bit by only querying the 
 columns selected, since we wouldn't need to return rows that exists but have 
 no values for the selected columns.
 For the exact syntax, there is probably a bunch of options. For instance:
 * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it 
 in the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
 Possibly a bit ugly outside of that.
 * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test 
 NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is 
 shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, 
 the only small object to {{NO EMPTY ROWS}} could be that it might suggest it 
 is filtering non existing rows (I mean, the fact we never ever return non 
 existing rows should hint that it's not what it does but well...) while we're 
 just filtering empty resultSet rows.
 Of course, if there is a pre-existing SQL syntax for that, it's even better, 
 though a very quick search didn't turn anything. Other suggestions welcome 
 too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


git commit: use junit asserts

2014-03-05 Thread dbrosius
Repository: cassandra
Updated Branches:
  refs/heads/trunk 7c7193769 - b173ce207


use junit asserts


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b173ce20
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b173ce20
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b173ce20

Branch: refs/heads/trunk
Commit: b173ce207b311a57f288269eebf13375a2459a99
Parents: 7c71937
Author: Dave Brosius dbros...@mebigfatguy.com
Authored: Wed Mar 5 23:57:37 2014 -0500
Committer: Dave Brosius dbros...@mebigfatguy.com
Committed: Wed Mar 5 23:57:37 2014 -0500

--
 .../db/compaction/CompactionsTest.java  | 67 +---
 1 file changed, 43 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b173ce20/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--
diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java 
b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
index 1497b3a..ac47bb6 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
@@ -18,22 +18,35 @@
 */
 package org.apache.cassandra.db.compaction;
 
-import java.io.*;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.assertNotNull;
+
+import java.io.File;
 import java.nio.ByteBuffer;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.TimeUnit;
 
-import com.google.common.base.Function;
-import com.google.common.collect.Iterables;
-import com.google.common.collect.Sets;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-
 import org.apache.cassandra.OrderedJUnit4ClassRunner;
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.ColumnFamily;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.DataRange;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.db.RangeTombstone;
+import org.apache.cassandra.db.RowPosition;
+import org.apache.cassandra.db.SuperColumns;
+import org.apache.cassandra.db.SystemKeyspace;
 import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
 import org.apache.cassandra.db.filter.QueryFilter;
 import org.apache.cassandra.dht.BytesToken;
@@ -45,8 +58,13 @@ import org.apache.cassandra.io.sstable.SSTableScanner;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.runner.RunWith;
 
-import static org.junit.Assert.*;
+import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Sets;
 
 @RunWith(OrderedJUnit4ClassRunner.class)
 public class CompactionsTest extends SchemaLoader
@@ -115,7 +133,7 @@ public class CompactionsTest extends SchemaLoader
 ColumnFamilyStore store = 
testSingleSSTableCompaction(LeveledCompactionStrategy.class.getCanonicalName());
 LeveledCompactionStrategy strategy = (LeveledCompactionStrategy) 
store.getCompactionStrategy();
 // tombstone removal compaction should not promote level
-assert strategy.getLevelSize(0) == 1;
+assertEquals(1, strategy.getLevelSize(0));
 }
 
 @Test
@@ -151,8 +169,8 @@ public class CompactionsTest extends SchemaLoader
 SSTableScanner scanner = 
sstable.getScanner(DataRange.forKeyRange(keyRange));
 OnDiskAtomIterator iter = scanner.next();
 assertEquals(key, iter.getKey());
-assert iter.next() instanceof RangeTombstone;
-assert !iter.hasNext();
+assertTrue(iter.next() instanceof RangeTombstone);
+assertFalse(iter.hasNext());
 }
 
 public static void assertMaxTimestamp(ColumnFamilyStore cfs, long 
maxTimestampExpected)
@@ -187,7 +205,7 @@ public class CompactionsTest extends SchemaLoader
 cfs.forceBlockingFlush();
 }
 CollectionSSTableReader toCompact = cfs.getSSTables();
-assert toCompact.size() == 2;
+assertEquals(2, toCompact.size());
 
 // Reinserting the same keys. We will compact only the previous 
sstable, but we need those new ones
 // to make sure we use 

[jira] [Commented] (CASSANDRA-6623) Null in a cell caused by expired TTL does not work with IF clause (in CQL3)

2014-03-05 Thread Paul Kendall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922046#comment-13922046
 ] 

Paul Kendall commented on CASSANDRA-6623:
-

This problem is not fixed. I am trying to exactly as the steps as in comment #2 
above and get exactly the same problems using the trunk version from git.

 Null in a cell caused by expired TTL does not work with IF clause (in CQL3)
 ---

 Key: CASSANDRA-6623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6623
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: One cluster with two nodes on a Linux and a Windows 
 system. cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 
 19.39.0. CQL3 Column Family
Reporter: Csaba Seres
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 6623.txt


 IF onecell=null clause does not work if the onecell has got its null value 
 from an expired TTL. If onecell is updated with null value (UPDATE) then IF 
 onecell=null works fine.
 This bug is not present when you create a table with COMPACT STORAGE 
 directive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6623) Null in a cell caused by expired TTL does not work with IF clause (in CQL3)

2014-03-05 Thread Paul Kendall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922046#comment-13922046
 ] 

Paul Kendall edited comment on CASSANDRA-6623 at 3/6/14 6:24 AM:
-

This problem is not fixed. I am trying to exactly as the steps as in comment #2 
above and get exactly the same problems using the trunk version from git.

The expiry time of a column is in seconds
The call time passed to isLive is in milliseconds
The queryTimestamp is in microseconds (not seconds like the comment in the 
patch says)
There are 3 calls to the CQL3CasConditions constructor passing the 
queryTimestamp and the one in ModificationStatement.executeWithCondition is the 
only one that changes the scale of the time value.

From my testing the best solution is to remove the scaling done here and apply 
a divide by 1000 in the constructor of CQL3CasConditions.


was (Author: pkendall):
This problem is not fixed. I am trying to exactly as the steps as in comment #2 
above and get exactly the same problems using the trunk version from git.

 Null in a cell caused by expired TTL does not work with IF clause (in CQL3)
 ---

 Key: CASSANDRA-6623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6623
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: One cluster with two nodes on a Linux and a Windows 
 system. cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 
 19.39.0. CQL3 Column Family
Reporter: Csaba Seres
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 6623.txt


 IF onecell=null clause does not work if the onecell has got its null value 
 from an expired TTL. If onecell is updated with null value (UPDATE) then IF 
 onecell=null works fine.
 This bug is not present when you create a table with COMPACT STORAGE 
 directive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6623) Null in a cell caused by expired TTL does not work with IF clause (in CQL3)

2014-03-05 Thread Paul Kendall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922046#comment-13922046
 ] 

Paul Kendall edited comment on CASSANDRA-6623 at 3/6/14 6:27 AM:
-

This problem is not fixed. I am trying to exactly as the steps as in comment #2 
above and get exactly the same problems using the trunk version from git.

The expiry time of a column is in seconds
The call time passed to isLive is in milliseconds
The queryTimestamp is in microseconds (not seconds like the comment in the 
patch says)
There are 3 calls to the CQL3CasConditions constructor passing the 
queryTimestamp and the one in ModificationStatement.executeWithCondition is the 
only one that changes the scale of the time value.

From my testing the best solution is to remove the scaling done here and apply 
a divide by 1000 in the constructor of CQL3CasConditions.

Attached patch [^0001-Fix-for-expiring-columns-used-in-cas-conditions.patch]


was (Author: pkendall):
This problem is not fixed. I am trying to exactly as the steps as in comment #2 
above and get exactly the same problems using the trunk version from git.

The expiry time of a column is in seconds
The call time passed to isLive is in milliseconds
The queryTimestamp is in microseconds (not seconds like the comment in the 
patch says)
There are 3 calls to the CQL3CasConditions constructor passing the 
queryTimestamp and the one in ModificationStatement.executeWithCondition is the 
only one that changes the scale of the time value.

From my testing the best solution is to remove the scaling done here and apply 
a divide by 1000 in the constructor of CQL3CasConditions.

 Null in a cell caused by expired TTL does not work with IF clause (in CQL3)
 ---

 Key: CASSANDRA-6623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6623
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: One cluster with two nodes on a Linux and a Windows 
 system. cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 
 19.39.0. CQL3 Column Family
Reporter: Csaba Seres
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 
 0001-Fix-for-expiring-columns-used-in-cas-conditions.patch, 6623.txt


 IF onecell=null clause does not work if the onecell has got its null value 
 from an expired TTL. If onecell is updated with null value (UPDATE) then IF 
 onecell=null works fine.
 This bug is not present when you create a table with COMPACT STORAGE 
 directive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6623) Null in a cell caused by expired TTL does not work with IF clause (in CQL3)

2014-03-05 Thread Paul Kendall (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Kendall updated CASSANDRA-6623:


Attachment: 0001-Fix-for-expiring-columns-used-in-cas-conditions.patch

 Null in a cell caused by expired TTL does not work with IF clause (in CQL3)
 ---

 Key: CASSANDRA-6623
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6623
 Project: Cassandra
  Issue Type: Bug
  Components: Tests
 Environment: One cluster with two nodes on a Linux and a Windows 
 system. cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 
 19.39.0. CQL3 Column Family
Reporter: Csaba Seres
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 2.0.6

 Attachments: 
 0001-Fix-for-expiring-columns-used-in-cas-conditions.patch, 6623.txt


 IF onecell=null clause does not work if the onecell has got its null value 
 from an expired TTL. If onecell is updated with null value (UPDATE) then IF 
 onecell=null works fine.
 This bug is not present when you create a table with COMPACT STORAGE 
 directive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


git commit: FBUtilities.singleton() should use the CF comparator

2014-03-05 Thread slebresne
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 249230834 - 773fade9a


FBUtilities.singleton() should use the CF comparator

patch by slebresne; reviewed by thobbs for CASSANDRA-6778


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/773fade9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/773fade9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/773fade9

Branch: refs/heads/cassandra-2.0
Commit: 773fade9aee009170c7062d174f2b78211061fce
Parents: 2492308
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Thu Mar 6 08:54:32 2014 +0100
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Thu Mar 6 08:56:08 2014 +0100

--
 CHANGES.txt |  1 +
 .../cql3/statements/ColumnGroupMap.java |  4 +-
 .../cql3/statements/SelectStatement.java|  7 +-
 .../org/apache/cassandra/db/SystemKeyspace.java |  2 +-
 .../cassandra/db/filter/NamesQueryFilter.java   |  4 +-
 .../apache/cassandra/db/filter/QueryFilter.java |  8 ---
 .../org/apache/cassandra/utils/FBUtilities.java |  6 +-
 .../apache/cassandra/db/LongKeyspaceTest.java   |  3 +-
 .../unit/org/apache/cassandra/SchemaLoader.java |  3 +-
 .../org/apache/cassandra/config/DefsTest.java   |  7 +-
 .../cassandra/db/CollationControllerTest.java   |  5 +-
 .../cassandra/db/ColumnFamilyStoreTest.java | 67 +---
 .../org/apache/cassandra/db/KeyspaceTest.java   |  7 +-
 .../apache/cassandra/db/ReadMessageTest.java|  4 +-
 .../db/RecoveryManagerTruncateTest.java |  3 +-
 .../apache/cassandra/db/RemoveColumnTest.java   |  3 +-
 .../cassandra/io/sstable/LegacySSTableTest.java |  4 +-
 .../cassandra/tools/SSTableExportTest.java  |  8 ++-
 18 files changed, 102 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/773fade9/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 19cedd8..d697e3f 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -33,6 +33,7 @@
  * Fix UPDATE updating PRIMARY KEY columns implicitly (CASSANDRA-6782)
  * Fix IllegalArgumentException when updating from 1.2 with SuperColumns
(CASSANDRA-6733)
+ * FBUtilities.singleton() should use the CF comparator (CASSANDRA-6778)
 Merged from 1.2:
  * Add CMSClassUnloadingEnabled JVM option (CASSANDRA-6541)
  * Catch memtable flush exceptions during shutdown (CASSANDRA-6735)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/773fade9/src/java/org/apache/cassandra/cql3/statements/ColumnGroupMap.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/ColumnGroupMap.java 
b/src/java/org/apache/cassandra/cql3/statements/ColumnGroupMap.java
index 5c3fcb9..1c9a346 100644
--- a/src/java/org/apache/cassandra/cql3/statements/ColumnGroupMap.java
+++ b/src/java/org/apache/cassandra/cql3/statements/ColumnGroupMap.java
@@ -25,6 +25,7 @@ import java.util.List;
 import java.util.Map;
 
 import org.apache.cassandra.db.Column;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.CompositeType;
 import org.apache.cassandra.utils.Pair;
 
@@ -155,7 +156,8 @@ public class ColumnGroupMap
 {
 for (int i = 0; i  idx; i++)
 {
-if (!c[i].equals(previous[i]))
+AbstractType? comp = composite.types.get(i);
+if (comp.compare(c[i], previous[i]) != 0)
 return false;
 }
 return true;

http://git-wip-us.apache.org/repos/asf/cassandra/blob/773fade9/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
--
diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java 
b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index 5a9d3d9..100383f 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
@@ -717,7 +717,7 @@ public class SelectStatement implements CQLStatement, 
MeasurableForPreparedCache
 {
 if (cfDef.isCompact)
 {
-return FBUtilities.singleton(builder.build());
+return FBUtilities.singleton(builder.build(), 
cfDef.cfm.comparator);
 }
 else
 {
@@ -994,10 +994,11 @@ public class SelectStatement implements CQLStatement, 
MeasurableForPreparedCache
 }
 else if (sliceRestriction != null)
 {
+ComparatorByteBuffer comp = cfDef.cfm.comparator;
 // For dynamic CF, the column could be out of the