date:20140922


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142983#comment-14142983
 ] 

Benedict commented on CASSANDRA-6904:
-

In 2.1 we should be using the header information to ensure we only replay each 
segment once.

I also think this is a good opportunity in 2.1 to drop in support for native 
archival, with which we could easily avoid backing up twice. The current 
situation feels a little clunky to me, since if we have a bug or other problem 
causing startup to crash we might repeatedly fill up the archive disk without 
the operator realising. That could be a follow up ticket, I'm fine either way, 
but it's really not a very challenging feature to insert and helps make this 
much more sane.

 commitlog segments may not be archived after restart
 

 Key: CASSANDRA-6904
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6904
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sam Tunnicliffe
 Fix For: 2.0.11, 2.1.1

 Attachments: 2.0-6904.txt, 2.1-6904.txt


 commitlog segments are archived when they are full, so the current active 
 segment will not be archived on restart (and its contents will not be 
 available for pitr).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7984) Nodetool status sometimes reports negative size

2014-09-22 Thread Ashic Mahtab (JIRA)

Ashic Mahtab created CASSANDRA-7984:
---

 Summary: Nodetool status sometimes reports negative size
 Key: CASSANDRA-7984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7984
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
 Environment: Ubuntu server x64, 4 nodes on 4 VMWare VMS, 2GB ram on 
each node.
Reporter: Ashic Mahtab
 Attachments: Capture.PNG, Capture2.PNG

This has been an issue for a few versions now (from the various RCs of 2.1, but 
possibly earlier). The issue doesn't manifest itself initially, but if the 
cluster is left alone for a while, slowly but surely it pops up. The issue is 
that nodetool status starts outputting negative values for Load. This causes 
the graphs in OpsCenter to start acting crazy as well. I've attached 
screenshots of both.

Running a rolling restart (or simply restarting the affected node) resolves the 
issue until it arises again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7985) stress tool doesn't support auth

2014-09-22 Thread Ashic Mahtab (JIRA)

Ashic Mahtab created CASSANDRA-7985:
---

 Summary: stress tool doesn't support auth
 Key: CASSANDRA-7985
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7985
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Ashic Mahtab


stress tool in 2.1 doesn't seem to support username / password authentication 
(like cqlsh).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7984) Nodetool status sometimes reports negative size


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143045#comment-14143045
 ] 

Marcus Eriksson commented on CASSANDRA-7984:


You sure you reproduced in 2.1.0? This should be fixed: CASSANDRA-7239

 Nodetool status sometimes reports negative size
 ---

 Key: CASSANDRA-7984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7984
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
 Environment: Ubuntu server x64, 4 nodes on 4 VMWare VMS, 2GB ram on 
 each node.
Reporter: Ashic Mahtab
 Attachments: Capture.PNG, Capture2.PNG


 This has been an issue for a few versions now (from the various RCs of 2.1, 
 but possibly earlier). The issue doesn't manifest itself initially, but if 
 the cluster is left alone for a while, slowly but surely it pops up. The 
 issue is that nodetool status starts outputting negative values for Load. 
 This causes the graphs in OpsCenter to start acting crazy as well. I've 
 attached screenshots of both.
 Running a rolling restart (or simply restarting the affected node) resolves 
 the issue until it arises again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7984) Nodetool status sometimes reports negative size

2014-09-22 Thread Ashic Mahtab (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143053#comment-14143053
]

Ashic Mahtab commented on CASSANDRA-7984:
-

Yup...definitely on 2.1.0. Reproduced on two separate clusters. The nodetool
status and the opscenter screenshots attached are from two different clusters
both running 2.1.0. Running Opscenter 5.0.1 as well.

It's it's something ridiculous like -99343439 bytes, then it's obvious. Trouble
is, I just found that one node was 9MB while others were ~45MB. Restarting the
9MB node and it jumped back up to ~45 MB.

Nodetool status sometimes reports negative size
---

Key: CASSANDRA-7984
URL: https://issues.apache.org/jira/browse/CASSANDRA-7984
Project: Cassandra
Issue Type: Bug
Components: Core, Tools
Environment: Ubuntu server x64, 4 nodes on 4 VMWare VMS, 2GB ram on
each node.
Reporter: Ashic Mahtab
Attachments: Capture.PNG, Capture2.PNG

This has been an issue for a few versions now (from the various RCs of 2.1,
but possibly earlier). The issue doesn't manifest itself initially, but if
the cluster is left alone for a while, slowly but surely it pops up. The
issue is that nodetool status starts outputting negative values for Load.
This causes the graphs in OpsCenter to start acting crazy as well. I've
attached screenshots of both.
Running a rolling restart (or simply restarting the affected node) resolves
the issue until it arises again.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7899) SSL does not work in cassandra-cli

2014-09-22 Thread Zdenek Ott (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143072#comment-14143072
 ] 

Zdenek Ott commented on CASSANDRA-7899:
---

Regular driver works fine and transport factory 
org.apache.cassandra.thrift.SSLTransportFactory works fine too.  Thanks.


 SSL does not work in cassandra-cli
 --

 Key: CASSANDRA-7899
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7899
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Linux 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 
 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.7.0_17
 Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
 [cqlsh 4.1.1 | Cassandra 2.0.10 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Reporter: Zdenek Ott
Assignee: Jason Brown
 Attachments: 7899-v1.txt


 When specify transport factory parameter '-tf 
 org.apache.cassandra.cli.transport.SSLTransportFactory' it throws exception, 
 see below, because SSLTransportFactory extends TTransportFactory not 
 ITransportFactory. 
 Exception in thread main java.lang.IllegalArgumentException: Cannot create 
 a transport factory 'org.apache.cassandra.cli.transport.SSLTransportFactory'.
 at 
 org.apache.cassandra.cli.CliOptions.validateAndSetTransportFactory(CliOptions.java:288)
 at 
 org.apache.cassandra.cli.CliOptions.processArgs(CliOptions.java:223)
 at org.apache.cassandra.cli.CliMain.main(CliMain.java:230)
 Caused by: java.lang.IllegalArgumentException: transport factory 
 'org.apache.cassandra.cli.transport.SSLTransportFactory' not derived from 
 ITransportFactory
 at 
 org.apache.cassandra.cli.CliOptions.validateAndSetTransportFactory(CliOptions.java:282)
 ... 2 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6904) commitlog segments may not be archived after restart

2014-09-22 Thread Sam Tunnicliffe (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143071#comment-14143071
 ] 

Sam Tunnicliffe commented on CASSANDRA-6904:


The reason I didn't add header checking during replay is I don't think it's 
actually possible in 2.1.

CLA.maybeRestoreArchive uses the header of the archive file to create the 
destination file for restore and if that target file already exists we throw an 
exception and bail on startup. The external restore script can in theory modify 
the destination filename but only to something which conforms to the defined 
pattern, otherwise the files aren't picked up during replay. Because of this 
constraint, the only thing the restore script can really do is modify the part 
of the filename that maps to the segment id. However, if it does do that, the 
file will be skipped anyway as CLR.recover(File) creates its descriptor from 
the (modified) filename, meaning that when it actually replays it, checksums 
fail due to the mismatched descriptor id.

That said, it's pretty trivial to add a check for duplicate segments during 
recovery, so if I've missed something let me know  I'll attach an updated 
patch.

On the native archive, I'd rather we handle any general rework of the archival 
process as a separate ticket if nobody objects.

 commitlog segments may not be archived after restart
 

 Key: CASSANDRA-6904
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6904
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sam Tunnicliffe
 Fix For: 2.0.11, 2.1.1

 Attachments: 2.0-6904.txt, 2.1-6904.txt


 commitlog segments are archived when they are full, so the current active 
 segment will not be archived on restart (and its contents will not be 
 available for pitr).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6904) commitlog segments may not be archived after restart

[
https://issues.apache.org/jira/browse/CASSANDRA-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143076#comment-14143076
]

Benedict commented on CASSANDRA-6904:
-

bq. if that target file already exists we throw an exception and bail on startup

Good point. Given we now _expect_ this situation, though, we should probably
change that behaviour. But agreed, the naming of the files during restore
should have already solved the problem I was highlighting.

bq. On the native archive, I'd rather we handle any general rework of the
archival process as a separate ticket if nobody objects.

WFM. But rework is much too strong a word IMO - we're going to have to
support the current archival option indefinitely, most likely (and really it's
low cost to do so, code is tiny) - we just want to give people a saner option
to move to. Which pretty much means offering a yaml option and making the
archive command a function:CLS-Runnable, with the runnable doing a native copy
(and perhaps supporting hard linking, which is slightly trickier, but still not
hard and not a requisite) as well as the current option. So it's small enough
I'd also feel very comfortable including it here.

commitlog segments may not be archived after restart

Key: CASSANDRA-6904
URL: https://issues.apache.org/jira/browse/CASSANDRA-6904
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Jonathan Ellis
Assignee: Sam Tunnicliffe
Fix For: 2.0.11, 2.1.1

Attachments: 2.0-6904.txt, 2.1-6904.txt

commitlog segments are archived when they are full, so the current active
segment will not be archived on restart (and its contents will not be
available for pitr).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-7984) Nodetool status sometimes reports negative size


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-7984.

Resolution: Fixed

closing as a duplicate of CASSANDRA-7239 and reopening that, we clearly didn't 
fix this problem properly

 Nodetool status sometimes reports negative size
 ---

 Key: CASSANDRA-7984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7984
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
 Environment: Ubuntu server x64, 4 nodes on 4 VMWare VMS, 2GB ram on 
 each node.
Reporter: Ashic Mahtab
 Attachments: Capture.PNG, Capture2.PNG


 This has been an issue for a few versions now (from the various RCs of 2.1, 
 but possibly earlier). The issue doesn't manifest itself initially, but if 
 the cluster is left alone for a while, slowly but surely it pops up. The 
 issue is that nodetool status starts outputting negative values for Load. 
 This causes the graphs in OpsCenter to start acting crazy as well. I've 
 attached screenshots of both.
 Running a rolling restart (or simply restarting the affected node) resolves 
 the issue until it arises again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (CASSANDRA-7239) Nodetool Status Reports Negative Load With VNodes Disabled


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reopened CASSANDRA-7239:

Reproduced In: 2.1.0, 2.1 rc6, 2.1 beta2  (was: 2.1 beta2, 2.1 rc6)

 Nodetool Status Reports Negative Load With VNodes Disabled
 --

 Key: CASSANDRA-7239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7239
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: 1000 Nodes EC2 m1.large ubuntu 12.04
Reporter: Russell Alexander Spitzer
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-add-new-sstable-size-when-rewriting-v2.patch, 
 0001-add-new-sstable-size-when-rewriting.patch, nodetool.png, opscenter.png


 When I run stress on a large cluster without vnodes (num_token =1 initial 
 token set) The loads reported by nodetool status are negative, or become 
 negative after stress is run.
 {code}
 UN  10.97.155.31-447426217 bytes  1   0.2%
 8d40568c-044c-4753-be26-4ab62710beba  rack1   

 UN  10.9.132.53 -447342449 bytes  1   0.2%
 58e7f255-803d-493b-a19e-58137466fb78  rack1   

 UN  10.37.151.202   -447298672 bytes  1   0.2%
 ba29b1f1-186f-45d0-9e59-6a528db8df5d  rack1  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7239) Nodetool Status Reports Negative Load With VNodes Disabled


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-7239:
---
Reproduced In: 2.1.0, 2.1 rc6, 2.1 beta2  (was: 2.1 beta2, 2.1 rc6, 2.1.0)
 Assignee: Benedict  (was: Marcus Eriksson)

So, to reproduce - generate a bunch of data, wait for it to compact and then 
restart the node, nodetool status should report the same amount of data, but 
currently it doesn't (reports more actually)

Could you have a look [~benedict] ? If not, just reassign to me and I'll do it

 Nodetool Status Reports Negative Load With VNodes Disabled
 --

 Key: CASSANDRA-7239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7239
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: 1000 Nodes EC2 m1.large ubuntu 12.04
Reporter: Russell Alexander Spitzer
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-add-new-sstable-size-when-rewriting-v2.patch, 
 0001-add-new-sstable-size-when-rewriting.patch, nodetool.png, opscenter.png


 When I run stress on a large cluster without vnodes (num_token =1 initial 
 token set) The loads reported by nodetool status are negative, or become 
 negative after stress is run.
 {code}
 UN  10.97.155.31-447426217 bytes  1   0.2%
 8d40568c-044c-4753-be26-4ab62710beba  rack1   

 UN  10.9.132.53 -447342449 bytes  1   0.2%
 58e7f255-803d-493b-a19e-58137466fb78  rack1   

 UN  10.37.151.202   -447298672 bytes  1   0.2%
 ba29b1f1-186f-45d0-9e59-6a528db8df5d  rack1  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143118#comment-14143118
]

Marcus Eriksson commented on CASSANDRA-7949:

i think this could be fixed by this: CASSANDRA-7745

LCS compaction low performance, many pending compactions, nodes are almost
idle
---

Key: CASSANDRA-7949
URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
Attachments: iostats.txt, nodetool_compactionstats.txt,
nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt

I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks +
2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the
load similar to the load in our future product. Before running the simulator
I had to pre-generate enough data. This was done using Java code and DataStax
Java driver. To avoid going deep into details, two tables have been
generated. Each table currently has about 55M rows and between few dozens and
few thousands of columns in each row.
This data generation process was generating massive amount of non-overlapping
data. Thus, the activity was write-only and highly parallel. This is not the
type of the traffic that the system will have ultimately to deal with, it
will be mix of reads and updates to the existing data in the future. This is
just to explain the choice of LCS, not mentioning the expensive SSD disk
space.
At some point while generating the data I have noticed that the compactions
started to pile up. I knew that I was overloading the cluster but I still
wanted the genration test to complete. I was expecting to give the cluster
enough time to finish the pending compactions and get ready for real traffic.
However, after the storm of write requests have been stopped I have noticed
that the number of pending compactions remained constant (and even climbed up
a little bit) on all nodes. After trying to tune some parameters (like
setting the compaction bandwidth cap to 0) I have noticed a strange pattern:
the nodes were compacting one of the CFs in a single stream using virtually
no CPU and no disk I/O. This process was taking hours. After that it would be
followed by a short burst of few dozens of compactions running in parallel
(CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for
many hours doing one compaction at time. So it looks like this:
# nodetool compactionstats
pending tasks: 3351
compaction typekeyspace table completed
total unit progress
Compaction myks table_list1 66499295588
1910515889913 bytes 3.48%
Active compaction remaining time :n/a
# df -h
...
/dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1
/dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2
/dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3
# find . -name **table_list1**Data** | grep -v snapshot | wc -l
1310
Among these files I see:
1043 files of 161Mb (my sstable size is 160Mb)
9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
263 files of various sized - between few dozens of Kb and 160Mb
I've been running the heavy load for about 1,5days and it's been close to 3
days after that and the number of pending compactions does not go down.
I have applied one of the not-so-obvious recommendations to disable
multithreaded compactions and that seems to be helping a bit - I see some
nodes started to have fewer pending compactions. About half of the cluster,
in fact. But even there I see they are sitting idle most of the time lazily
compacting in one stream with CPU at ~140% and occasionally doing the bursts
of compaction work for few minutes.
I am wondering if this is really a bug or something in the LCS logic that
would manifest itself only in such an edge case scenario where I have loaded
lots of unique data quickly.
By the way, I see this pattern only for one of two tables - the one that has
about 4 times more data than another (space-wise, number of rows is the
same). Looks like all these pending compactions are really only for that
larger table.
I'll be attaching the relevant logs shortly.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7239) Nodetool Status Reports Negative Load With VNodes Disabled


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7239:

Attachment: 7239.txt

The attached patch should solve the problem (at least as far as any 
contribution preemptive opening of compaction results is concerned)

 Nodetool Status Reports Negative Load With VNodes Disabled
 --

 Key: CASSANDRA-7239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7239
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: 1000 Nodes EC2 m1.large ubuntu 12.04
Reporter: Russell Alexander Spitzer
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-add-new-sstable-size-when-rewriting-v2.patch, 
 0001-add-new-sstable-size-when-rewriting.patch, 7239.txt, nodetool.png, 
 opscenter.png


 When I run stress on a large cluster without vnodes (num_token =1 initial 
 token set) The loads reported by nodetool status are negative, or become 
 negative after stress is run.
 {code}
 UN  10.97.155.31-447426217 bytes  1   0.2%
 8d40568c-044c-4753-be26-4ab62710beba  rack1   

 UN  10.9.132.53 -447342449 bytes  1   0.2%
 58e7f255-803d-493b-a19e-58137466fb78  rack1   

 UN  10.37.151.202   -447298672 bytes  1   0.2%
 ba29b1f1-186f-45d0-9e59-6a528db8df5d  rack1  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7239) Nodetool Status Reports Negative Load With VNodes Disabled


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7239:

Attachment: 7239.txt

 Nodetool Status Reports Negative Load With VNodes Disabled
 --

 Key: CASSANDRA-7239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7239
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: 1000 Nodes EC2 m1.large ubuntu 12.04
Reporter: Russell Alexander Spitzer
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-add-new-sstable-size-when-rewriting-v2.patch, 
 0001-add-new-sstable-size-when-rewriting.patch, 7239.txt, nodetool.png, 
 opscenter.png


 When I run stress on a large cluster without vnodes (num_token =1 initial 
 token set) The loads reported by nodetool status are negative, or become 
 negative after stress is run.
 {code}
 UN  10.97.155.31-447426217 bytes  1   0.2%
 8d40568c-044c-4753-be26-4ab62710beba  rack1   

 UN  10.9.132.53 -447342449 bytes  1   0.2%
 58e7f255-803d-493b-a19e-58137466fb78  rack1   

 UN  10.37.151.202   -447298672 bytes  1   0.2%
 ba29b1f1-186f-45d0-9e59-6a528db8df5d  rack1  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7239) Nodetool Status Reports Negative Load With VNodes Disabled


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7239:

Attachment: (was: 7239.txt)

 Nodetool Status Reports Negative Load With VNodes Disabled
 --

 Key: CASSANDRA-7239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7239
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: 1000 Nodes EC2 m1.large ubuntu 12.04
Reporter: Russell Alexander Spitzer
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-add-new-sstable-size-when-rewriting-v2.patch, 
 0001-add-new-sstable-size-when-rewriting.patch, 7239.txt, nodetool.png, 
 opscenter.png


 When I run stress on a large cluster without vnodes (num_token =1 initial 
 token set) The loads reported by nodetool status are negative, or become 
 negative after stress is run.
 {code}
 UN  10.97.155.31-447426217 bytes  1   0.2%
 8d40568c-044c-4753-be26-4ab62710beba  rack1   

 UN  10.9.132.53 -447342449 bytes  1   0.2%
 58e7f255-803d-493b-a19e-58137466fb78  rack1   

 UN  10.37.151.202   -447298672 bytes  1   0.2%
 ba29b1f1-186f-45d0-9e59-6a528db8df5d  rack1  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7986) The Pig tests cannot run on Cygwin on Windows

2014-09-22 Thread Benjamin Lerer (JIRA)

Benjamin Lerer created CASSANDRA-7986:
-

 Summary: The Pig tests cannot run on Cygwin on Windows
 Key: CASSANDRA-7986
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7986
 Project: Cassandra
  Issue Type: Bug
 Environment: Windows 8.1, Cygwin 1.7.32 
Reporter: Benjamin Lerer
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 3.0


When running the Pig-Tests on Cygwin Windows I run into 
https://issues.apache.org/jira/browse/HADOOP-7682. 
Ideally this issue should be properly fix in HADOOP but as the issue is open 
since September 2011 it will be good if we implemented the workaround 
mentionned by Joshua Caplan for the Pig-Tests 
(https://issues.apache.org/jira/browse/HADOOP-7682?focusedCommentId=13440120page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13440120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7986) The Pig tests cannot run on Cygwin on Windows

2014-09-22 Thread Benjamin Lerer (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7986:
--
Attachment: CASSANDRA-7986.txt

This patch check if the environment is Windows and if it is the case it loads a 
WindowsLocalFileSystem instead of the org.apache.hadoop.fs.LocalFileSystem. The 
WindowsLocalFileSystem will swallow the IOException that can occur when Hadoop 
try to change the permissions and by then allow the system to continue.  

 The Pig tests cannot run on Cygwin on Windows
 -

 Key: CASSANDRA-7986
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7986
 Project: Cassandra
  Issue Type: Bug
 Environment: Windows 8.1, Cygwin 1.7.32 
Reporter: Benjamin Lerer
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 3.0

 Attachments: CASSANDRA-7986.txt


 When running the Pig-Tests on Cygwin Windows I run into 
 https://issues.apache.org/jira/browse/HADOOP-7682. 
 Ideally this issue should be properly fix in HADOOP but as the issue is open 
 since September 2011 it will be good if we implemented the workaround 
 mentionned by Joshua Caplan for the Pig-Tests 
 (https://issues.apache.org/jira/browse/HADOOP-7682?focusedCommentId=13440120page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13440120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Nikolai Grigoriev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143214#comment-14143214
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Update: I have completed my last data writing test, now I have enough data to 
start another phase. I did that last test with compaction strategy set to STCS 
but disabled for the duration of the test. Once all writers have finished I 
have re-enabled the compactions. In under one day STCS has completed the job on 
all nodes, I ended up with few dozens (~40 or so) large sstables, total amount 
of data about 23Tb on 15 nodes.

I have switched back to LCS this morning and immediately observed the hockey 
stick on the pending compaction graph. Now each node reports about 8-10K of 
pending compactions, they are all compacting in one stream per CF and consume 
virtually no resources:

{code}
# nodetool compactionstats
pending tasks: 9900
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction  testks test_list2 26630083587
812539331642 bytes 3.28%
   Compaction  testks test_list1 24071738534   
1994877844635 bytes 1.21%
Active compaction remaining time :   2h16m55s


# w
 13:41:45 up 23 days, 18:13,  2 users,  load average: 1.81, 2.13, 2.51
...


# iostat -mdx 5
Linux 3.8.13-44.el6uek.x86_64 (cassandra01.mydomain.com)  22/09/14
_x86_64_(32 CPU)

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdb   0.00 5.73   88.00   13.33 5.47 5.16   214.84 
0.515.08   0.39   3.98
sda   0.00 8.160.13   65.80 0.00 3.28   101.80 
0.060.87   0.11   0.71
sdc   0.00 4.93   75.05   13.34 4.67 5.42   233.62 
0.495.55   0.39   3.42
sdd   0.00 5.82   86.40   14.10 5.37 5.52   221.83 
0.565.59   0.38   3.81

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdb   0.00 0.00  134.600.00 8.37 0.00   127.30 
0.060.42   0.42   5.64
sda   0.0013.000.00  220.40 0.00 0.96 8.94 
0.010.05   0.01   0.32
sdc   0.00 0.00   36.400.00 2.27 0.00   128.00 
0.010.41   0.41   1.50
sdd   0.00 0.00   21.200.00 1.32 0.00   128.00 
0.000.19   0.19   0.40
{code}

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of

[jira] [Updated] (CASSANDRA-7983) nodetool repair triggers OOM


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-7983:
-
Attachment: nbcqa-chc-a03_systemlog.tar.Z
nbcqa-chc-a01_systemlog.tar.Z

The rest of the system logs

 nodetool repair triggers OOM
 

 Key: CASSANDRA-7983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment:  
 {noformat}
  INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 
 4.5.0
  INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop 
 version: 1.0.4.13
  INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 
 0.12.0.3
  INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 
 0.10.1
  INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 
 4.6.0.2.4
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 
 1.4.4.14.1
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout 
 version: 0.8
  INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender 
 version: 3.0.2
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 
 0.9.1
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 
 0.9.1.1
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM 
 vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap 
 size: 6316621824/6316621824
 {noformat}
Reporter: Jose Martinez Poblete
 Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, 
 nbcqa-chc-a03_systemlog.tar.Z, system.log


 Customer has a 3 node cluster with 500Mb data on each node
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ nodetool status
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: CH2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host ID  
  Rack
 UN  162.150.4.234  255.26 MB  256 33.2%  
 4ad1b6a8-8759-4920-b54a-f059126900df  RAC1
 UN  162.150.4.235  318.37 MB  256 32.6%  
 3eb0ec58-4b81-442e-bee5-4c91da447f38  RAC1
 UN  162.150.4.167  243.7 MB   256 34.2%  
 5b2c1900-bf03-41c1-bb4e-82df1655b8d8  RAC1
 [cassandra@nbcqa-chc-a02 ~]$
 {noformat}
 After we run repair command, system runs into OOM after some 45 minutes
 Nothing else is running
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ date
 Fri Sep 19 15:55:33 UTC 2014
 [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et 
 -9220354588320251873
 Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
 {noformat}
 Herer is when we run OOM
 {noformat}
 ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 
 199) Exception in thread Thread[ReadStage:28914,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:69)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:76)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
 at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
 at 
 org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.init(SimpleSliceReader.java:57)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:42)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
 at

[jira] [Updated] (CASSANDRA-7983) nodetool repair triggers OOM


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-7983:
-
Attachment: (was: nbcqa-chc-a01_systemlog.tar.Z)

 nodetool repair triggers OOM
 

 Key: CASSANDRA-7983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment:  
 {noformat}
  INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 
 4.5.0
  INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop 
 version: 1.0.4.13
  INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 
 0.12.0.3
  INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 
 0.10.1
  INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 
 4.6.0.2.4
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 
 1.4.4.14.1
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout 
 version: 0.8
  INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender 
 version: 3.0.2
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 
 0.9.1
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 
 0.9.1.1
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM 
 vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap 
 size: 6316621824/6316621824
 {noformat}
Reporter: Jose Martinez Poblete
 Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, 
 nbcqa-chc-a03_systemlog.tar.Z, system.log


 Customer has a 3 node cluster with 500Mb data on each node
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ nodetool status
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: CH2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host ID  
  Rack
 UN  162.150.4.234  255.26 MB  256 33.2%  
 4ad1b6a8-8759-4920-b54a-f059126900df  RAC1
 UN  162.150.4.235  318.37 MB  256 32.6%  
 3eb0ec58-4b81-442e-bee5-4c91da447f38  RAC1
 UN  162.150.4.167  243.7 MB   256 34.2%  
 5b2c1900-bf03-41c1-bb4e-82df1655b8d8  RAC1
 [cassandra@nbcqa-chc-a02 ~]$
 {noformat}
 After we run repair command, system runs into OOM after some 45 minutes
 Nothing else is running
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ date
 Fri Sep 19 15:55:33 UTC 2014
 [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et 
 -9220354588320251873
 Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
 {noformat}
 Herer is when we run OOM
 {noformat}
 ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 
 199) Exception in thread Thread[ReadStage:28914,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:69)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:76)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
 at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
 at 
 org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.init(SimpleSliceReader.java:57)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:42)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
 at

[jira] [Updated] (CASSANDRA-7983) nodetool repair triggers OOM


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-7983:
-
Attachment: (was: nbcqa-chc-a03_systemlog.tar.Z)

 nodetool repair triggers OOM
 

 Key: CASSANDRA-7983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment:  
 {noformat}
  INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 
 4.5.0
  INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop 
 version: 1.0.4.13
  INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 
 0.12.0.3
  INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 
 0.10.1
  INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 
 4.6.0.2.4
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 
 1.4.4.14.1
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout 
 version: 0.8
  INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender 
 version: 3.0.2
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 
 0.9.1
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 
 0.9.1.1
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM 
 vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap 
 size: 6316621824/6316621824
 {noformat}
Reporter: Jose Martinez Poblete
 Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, 
 nbcqa-chc-a03_systemlog.tar.Z, system.log


 Customer has a 3 node cluster with 500Mb data on each node
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ nodetool status
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: CH2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host ID  
  Rack
 UN  162.150.4.234  255.26 MB  256 33.2%  
 4ad1b6a8-8759-4920-b54a-f059126900df  RAC1
 UN  162.150.4.235  318.37 MB  256 32.6%  
 3eb0ec58-4b81-442e-bee5-4c91da447f38  RAC1
 UN  162.150.4.167  243.7 MB   256 34.2%  
 5b2c1900-bf03-41c1-bb4e-82df1655b8d8  RAC1
 [cassandra@nbcqa-chc-a02 ~]$
 {noformat}
 After we run repair command, system runs into OOM after some 45 minutes
 Nothing else is running
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ date
 Fri Sep 19 15:55:33 UTC 2014
 [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et 
 -9220354588320251873
 Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
 {noformat}
 Herer is when we run OOM
 {noformat}
 ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 
 199) Exception in thread Thread[ReadStage:28914,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:69)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:76)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
 at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
 at 
 org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.init(SimpleSliceReader.java:57)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:42)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
 at

[jira] [Issue Comment Deleted] (CASSANDRA-7983) nodetool repair triggers OOM


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-7983:
-
Comment: was deleted

(was: The rest of the system logs)

 nodetool repair triggers OOM
 

 Key: CASSANDRA-7983
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment:  
 {noformat}
  INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 
 4.5.0
  INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop 
 version: 1.0.4.13
  INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 
 0.12.0.3
  INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 
 0.10.1
  INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 
 4.6.0.2.4
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 
 1.4.4.14.1
  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout 
 version: 0.8
  INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender 
 version: 3.0.2
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 
 0.9.1
  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 
 0.9.1.1
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM 
 vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap 
 size: 6316621824/6316621824
 {noformat}
Reporter: Jose Martinez Poblete
 Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, 
 nbcqa-chc-a03_systemlog.tar.Z, system.log


 Customer has a 3 node cluster with 500Mb data on each node
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ nodetool status
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: CH2
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host ID  
  Rack
 UN  162.150.4.234  255.26 MB  256 33.2%  
 4ad1b6a8-8759-4920-b54a-f059126900df  RAC1
 UN  162.150.4.235  318.37 MB  256 32.6%  
 3eb0ec58-4b81-442e-bee5-4c91da447f38  RAC1
 UN  162.150.4.167  243.7 MB   256 34.2%  
 5b2c1900-bf03-41c1-bb4e-82df1655b8d8  RAC1
 [cassandra@nbcqa-chc-a02 ~]$
 {noformat}
 After we run repair command, system runs into OOM after some 45 minutes
 Nothing else is running
 {noformat}
 [cassandra@nbcqa-chc-a02 ~]$ date
 Fri Sep 19 15:55:33 UTC 2014
 [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et 
 -9220354588320251873
 Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
 {noformat}
 Herer is when we run OOM
 {noformat}
 ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 
 199) Exception in thread Thread[ReadStage:28914,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:69)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:76)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
 at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
 at 
 org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.init(SimpleSliceReader.java:57)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:42)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
 at

[jira] [Updated] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout

2014-09-22 Thread Christian Spriegel (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Spriegel updated CASSANDRA-7886:
--
Attachment: 7886_v1.txt

 TombstoneOverwhelmingException should not wait for timeout
 --

 Key: CASSANDRA-7886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Tested with Cassandra 2.0.8
Reporter: Christian Spriegel
Priority: Minor
 Fix For: 3.0

 Attachments: 7886_v1.txt


 *Issue*
 When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
 cause the query to be simply dropped on every data-node, but no response is 
 sent back to the coordinator. Instead the coordinator waits for the specified 
 read_request_timeout_in_ms.
 On the application side this can cause memory issues, since the application 
 is waiting for the timeout interval for every request.Therefore, if our 
 application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
 our entire application cluster goes down :-(
 *Proposed solution*
 I think the data nodes should send a error message to the coordinator when 
 they run into a TombstoneOverwhelmingException. Then the coordinator does not 
 have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout

2014-09-22 Thread Christian Spriegel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143247#comment-14143247
 ] 

Christian Spriegel commented on CASSANDRA-7886:
---

[~kohlisankalp]: Thanks for you feedback.

[~slebresne], [~kohlisankalp]: I attached a patch for C 2.1 where I implemented 
remote failure handling for reads and range-reads.

Using a ccm 3 node cluster, I tested remote and local read failures. Both CLI 
and CQLSH return instantly, instead of waiting for timeouts.

Any feedback? Could this be merged into 2.1? Please let me know if the patch 
needs improvement.

I guess, the next steps would be to implement callbacks for writes, truncates, 
etc.

 TombstoneOverwhelmingException should not wait for timeout
 --

 Key: CASSANDRA-7886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Tested with Cassandra 2.0.8
Reporter: Christian Spriegel
Priority: Minor
 Fix For: 3.0

 Attachments: 7886_v1.txt


 *Issue*
 When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
 cause the query to be simply dropped on every data-node, but no response is 
 sent back to the coordinator. Instead the coordinator waits for the specified 
 read_request_timeout_in_ms.
 On the application side this can cause memory issues, since the application 
 is waiting for the timeout interval for every request.Therefore, if our 
 application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
 our entire application cluster goes down :-(
 *Proposed solution*
 I think the data nodes should send a error message to the coordinator when 
 they run into a TombstoneOverwhelmingException. Then the coordinator does not 
 have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5902) Dealing with hints after a topology change

2014-09-22 Thread Branimir Lambov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143320#comment-14143320
 ] 

Branimir Lambov commented on CASSANDRA-5902:


As this appears to be the same process as a normal write would take, I created 
a new version of the patch (at the same github branch, 
https://github.com/blambov/cassandra/compare/handoff-topology) which relies on 
StorageProxy.sendToHintedEndpoints to do the replication and write the new 
hints as necessary. As a side benefit, messages to other datacentres will now 
be combined.

A special WriteOrHintResponseHandler is provided to ensure the hint is only 
deleted after all endpoints have either responded or have been hinted.

The previous version offered fine-grained rate control, which is much more 
difficult to implement now. The new version will still obey the rate in the 
longer term, but will send all copies of the hint in a single burst.

 Dealing with hints after a topology change
 --

 Key: CASSANDRA-5902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5902
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 2.1.1


 Hints are stored and delivered by destination node id.  This allows them to 
 survive IP changes in the target, while making scan all the hints for a 
 given destination an efficient operation.  However, we do not detect and 
 handle new node assuming responsibility for the hinted row via bootstrap 
 before it can be delivered.
 I think we have to take a performance hit in this case -- we need to deliver 
 such a hint to *all* replicas, since we don't know which is the new one.  
 This happens infrequently enough, however -- requiring first the target node 
 to be down to create the hint, then the hint owner to be down long enough for 
 the target to both recover and stream to a new node -- that this should be 
 okay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[4/6] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ce357d91
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ce357d91
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ce357d91

Branch: refs/heads/trunk
Commit: ce357d9158fdb74c02a33207a86f235bb16016a1
Parents: eecc034 d96485f
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:17 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:17 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ce357d91/src/java/org/apache/cassandra/service/ClientState.java
--

[2/6] git commit: fix typo

fix typo


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d96485ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d96485ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d96485ff

Branch: refs/heads/cassandra-2.1
Commit: d96485ff16d8b90173007a8d6601aba8d105b8f0
Parents: d143487
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:12 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:12 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d96485ff/src/java/org/apache/cassandra/service/ClientState.java
--
diff --git a/src/java/org/apache/cassandra/service/ClientState.java 
b/src/java/org/apache/cassandra/service/ClientState.java
index 38c56da..7611a14 100644
--- a/src/java/org/apache/cassandra/service/ClientState.java
+++ b/src/java/org/apache/cassandra/service/ClientState.java
@@ -148,7 +148,7 @@ public class ClientState
 public String getKeyspace() throws InvalidRequestException
 {
 if (keyspace == null)
-throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicity specify keyspace.tablename);
+throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicitly specify keyspace.tablename);
 return keyspace;
 }

[6/6] git commit: Merge branch 'cassandra-2.1' into trunk

Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f1bd50ce
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f1bd50ce
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f1bd50ce

Branch: refs/heads/trunk
Commit: f1bd50ce5f0fc81877b99231db018010675eba1e
Parents: 7762518 ce357d9
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:26 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:26 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1bd50ce/src/java/org/apache/cassandra/service/ClientState.java
--

[5/6] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ce357d91
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ce357d91
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ce357d91

Branch: refs/heads/cassandra-2.1
Commit: ce357d9158fdb74c02a33207a86f235bb16016a1
Parents: eecc034 d96485f
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:17 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:17 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ce357d91/src/java/org/apache/cassandra/service/ClientState.java
--

[3/6] git commit: fix typo

fix typo


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d96485ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d96485ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d96485ff

Branch: refs/heads/trunk
Commit: d96485ff16d8b90173007a8d6601aba8d105b8f0
Parents: d143487
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:12 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:12 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d96485ff/src/java/org/apache/cassandra/service/ClientState.java
--
diff --git a/src/java/org/apache/cassandra/service/ClientState.java 
b/src/java/org/apache/cassandra/service/ClientState.java
index 38c56da..7611a14 100644
--- a/src/java/org/apache/cassandra/service/ClientState.java
+++ b/src/java/org/apache/cassandra/service/ClientState.java
@@ -148,7 +148,7 @@ public class ClientState
 public String getKeyspace() throws InvalidRequestException
 {
 if (keyspace == null)
-throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicity specify keyspace.tablename);
+throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicitly specify keyspace.tablename);
 return keyspace;
 }

[1/6] git commit: fix typo

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 d143487cb - d96485ff1
  refs/heads/cassandra-2.1 eecc034b6 - ce357d915
  refs/heads/trunk 776251874 - f1bd50ce5


fix typo


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d96485ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d96485ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d96485ff

Branch: refs/heads/cassandra-2.0
Commit: d96485ff16d8b90173007a8d6601aba8d105b8f0
Parents: d143487
Author: Yuki Morishita yu...@apache.org
Authored: Mon Sep 22 11:12:12 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Mon Sep 22 11:12:12 2014 -0500

--
 src/java/org/apache/cassandra/service/ClientState.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d96485ff/src/java/org/apache/cassandra/service/ClientState.java
--
diff --git a/src/java/org/apache/cassandra/service/ClientState.java 
b/src/java/org/apache/cassandra/service/ClientState.java
index 38c56da..7611a14 100644
--- a/src/java/org/apache/cassandra/service/ClientState.java
+++ b/src/java/org/apache/cassandra/service/ClientState.java
@@ -148,7 +148,7 @@ public class ClientState
 public String getKeyspace() throws InvalidRequestException
 {
 if (keyspace == null)
-throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicity specify keyspace.tablename);
+throw new InvalidRequestException(No keyspace has been specified. 
USE a keyspace, or explicitly specify keyspace.tablename);
 return keyspace;
 }

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143357#comment-14143357
]

Marcus Eriksson commented on CASSANDRA-7949:

so, if you switch to STCS and let it compact, you are bound to do a lot of L0
to L1 compaction in the beginning since all sstables are in level 0 and need to
pass through L1 before making it to the higher levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably
between L0 and L1, that needs to finish before it can continue doing higher
level compactions

LCS compaction low performance, many pending compactions, nodes are almost
idle
---

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143357#comment-14143357
 ] 

Marcus Eriksson edited comment on CASSANDRA-7949 at 9/22/14 4:20 PM:
-

so, if you switch to STCS and then back to LCS and let it compact, you are 
bound to do a lot of L0 to L1 compaction in the beginning since all sstables 
are in level 0 and need to pass through L1 before making it to the higher 
levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only 
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably 
between L0 and L1, that needs to finish before it can continue doing higher 
level compactions


was (Author: krummas):
so, if you switch to STCS and let it compact, you are bound to do a lot of L0 
to L1 compaction in the beginning since all sstables are in level 0 and need to 
pass through L1 before making it to the higher levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only 
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably 
between L0 and L1, that needs to finish before it can continue doing higher 
level compactions

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143365#comment-14143365
]

Benedict commented on CASSANDRA-7949:
-

That sounds like fairly suboptimal behaviour still. But it sounds like
CASSANDRA-6696 should help to address it.

When there is some time, we should also reintroduce a more functional
multi-threaded compaction. It should be quite achievable to build one that is
correct, safe and faster for these scenarios.

LCS compaction low performance, many pending compactions, nodes are almost
idle
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-09-22 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink resolved CASSANDRA-7247.
--
Resolution: Duplicate

Brandon's approach to enabled on a duration makes more sense so marking this 
duplicate of it.

 Provide top ten most frequent keys per column family
 

 Key: CASSANDRA-7247
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Lohfink
Assignee: Chris Lohfink
Priority: Minor
 Attachments: cassandra-2.1-7247.txt, jconsole.png, patch.txt


 Since already have the nice addthis stream library, can use it to keep track 
 of most frequent DecoratedKeys that come through the system using 
 StreamSummaries ([nice 
 explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5902) Dealing with hints after a topology change

[
https://issues.apache.org/jira/browse/CASSANDRA-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143474#comment-14143474
]

Benedict commented on CASSANDRA-5902:
-

I don't think this behaves as you expect right now; it looks like no new
hinting will be done under any circumstance, and the original hint will not be
deleted in the event that any end point fails to respond. It's possible I'm
missing something obvious though.

Take a look at...
Hint writing: WriteCallbackInfo.shouldHint(), MessagingService.expiringMap
Hint deletion: CallbackInfo.isFailureCallback(), IAsyncCallbackWithFailure,
MessagingService.expiringMap

It seems that a new IAsyncCallbackWithFailure that both hints and decrements
the callback count, so that the deletion is definitely called eventually is
what's necessary.

Separately, it's not clear to me we should be stopping hint replay to the
target if one of these extra hints fails to be delivered, since they're
unrelated. This could cause hints to not be delivered before their ttl expires
unnecessarily, which would be bad for consistency.

Dealing with hints after a topology change
--

Key: CASSANDRA-5902
URL: https://issues.apache.org/jira/browse/CASSANDRA-5902
Project: Cassandra
Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Branimir Lambov
Priority: Minor
Fix For: 2.1.1

Hints are stored and delivered by destination node id. This allows them to
survive IP changes in the target, while making scan all the hints for a
given destination an efficient operation. However, we do not detect and
handle new node assuming responsibility for the hinted row via bootstrap
before it can be delivered.
I think we have to take a performance hit in this case -- we need to deliver
such a hint to *all* replicas, since we don't know which is the new one.
This happens infrequently enough, however -- requiring first the target node
to be down to create the hint, then the hint owner to be down long enough for
the target to both recover and stream to a new node -- that this should be
okay.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7987) Better defaults for CQL tables with clustering keys

T Jake Luciani created CASSANDRA-7987:
-

 Summary: Better defaults for CQL tables with clustering keys
 Key: CASSANDRA-7987
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7987
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 3.0


We currently default to STCS regardless.  If a user creates a table with 
clustering keys (maybe specifically types with likely high cardinality?)  We 
should set compaction to LCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6904) commitlog segments may not be archived after restart


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143552#comment-14143552
 ] 

Jonathan Ellis commented on CASSANDRA-6904:
---

+1 separate ticket

 commitlog segments may not be archived after restart
 

 Key: CASSANDRA-6904
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6904
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sam Tunnicliffe
 Fix For: 2.0.11, 2.1.1

 Attachments: 2.0-6904.txt, 2.1-6904.txt


 commitlog segments are archived when they are full, so the current active 
 segment will not be archived on restart (and its contents will not be 
 available for pitr).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7523) add date and time types


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143572#comment-14143572
 ] 

Joshua McKenzie commented on CASSANDRA-7523:


Updated branches available:
[Cassandra 
changes|https://github.com/josh-mckenzie/cassandra/compare/7523_squashed]
[Python driver 
changes|https://github.com/josh-mckenzie/python-driver/compare/7523_squashed]

I've converted both types to support byte-order comparibility.  They also no 
longer accept empty strings and don't validate 0 byte inputs.  The TimeType was 
trivial as it was already bounded to byteorder comparable ranges anyway but the 
SimpleDateType change deserves a bit of explanation.

I went back and forth offline w/benedict about the SimpleDateType change - this 
implementation uses an unsigned integer w/epoch at 2^31 as our date range which 
requires some shifting and reliance on arithmetic overflow in Java thanks to 
the lack of a 1st-class unsigned integer type.  On top of that, the defined 
range is different than epoch is 0 that most people might expect.

We could use the drivers to mask this and shift values to epoch at zero (which 
I didn't do in the attached python driver changes); I dislike implementation 
details of our internal treatment of dates pressuring non-idiomatic external 
treatments of data in this way but it goes so far as the drivers where they 
have the freedom to implement as they see fit.  It's something I'm willing to 
accept for the benefits it gives us.

I've also added more unit tests surrounding the types and their comparison and 
updated the cqlshlib unit tests and python driver unit tests to also conform to 
the new range expectations.

 add date and time types
 ---

 Key: CASSANDRA-7523
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7523
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Jonathan Ellis
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 2.1.1, 3.0


 http://www.postgresql.org/docs/9.1/static/datatype-datetime.html
 (we already have timestamp; interval is out of scope for now, and see 
 CASSANDRA-6350 for discussion on timestamp-with-time-zone.  but date/time 
 should be pretty easy to add.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143581#comment-14143581
 ] 

Jonathan Ellis commented on CASSANDRA-7886:
---

I'm not thrilled about adding complexity to handle a situation where the real 
answer is, stop doing that.

In this case, it sounds like just bounding the number of connections would be 
sufficient to keep the cluster from falling over in the meantime.

 TombstoneOverwhelmingException should not wait for timeout
 --

 Key: CASSANDRA-7886
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Tested with Cassandra 2.0.8
Reporter: Christian Spriegel
Assignee: Christian Spriegel
Priority: Minor
 Fix For: 2.1.1

 Attachments: 7886_v1.txt


 *Issue*
 When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
 cause the query to be simply dropped on every data-node, but no response is 
 sent back to the coordinator. Instead the coordinator waits for the specified 
 read_request_timeout_in_ms.
 On the application side this can cause memory issues, since the application 
 is waiting for the timeout interval for every request.Therefore, if our 
 application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
 our entire application cluster goes down :-(
 *Proposed solution*
 I think the data nodes should send a error message to the coordinator when 
 they run into a TombstoneOverwhelmingException. Then the coordinator does not 
 have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7987) Better defaults for CQL tables with clustering keys


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143606#comment-14143606
 ] 

Jonathan Ellis commented on CASSANDRA-7987:
---

Is that really a good rule of thumb?  E.g. if my slice size is smaller than a 
single sstable's worth then LCS is a lot of extra work for little gain.

 Better defaults for CQL tables with clustering keys
 ---

 Key: CASSANDRA-7987
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7987
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Priority: Minor
  Labels: lhf
 Fix For: 3.0


 We currently default to STCS regardless.  If a user creates a table with 
 clustering keys (maybe specifically types with likely high cardinality?)  We 
 should set compaction to LCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7523) add date and time types


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143616#comment-14143616
 ] 

Benedict commented on CASSANDRA-7523:
-

To clarify, all we are doing is defining dates as an unsigned integer, with the 
epoch being the _minimum_ representable date as opposed to the _middle_. This 
means any language without unsigned integer support has to do some minor 
fiddling, but this is not particularly obtuse for writing a network client, and 
I wouldn't call it foisting internal treament of dates upon the drivers in any 
unreasonable way.

 add date and time types
 ---

 Key: CASSANDRA-7523
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7523
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Jonathan Ellis
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 2.1.1, 3.0


 http://www.postgresql.org/docs/9.1/static/datatype-datetime.html
 (we already have timestamp; interval is out of scope for now, and see 
 CASSANDRA-6350 for discussion on timestamp-with-time-zone.  but date/time 
 should be pretty easy to add.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout

2014-09-22 Thread Christian Spriegel (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143623#comment-14143623
]

Christian Spriegel commented on CASSANDRA-7886:
---

[~jbellis]: Dont get me wrong: There is definitely some client-limitation
necessary in the application. But it is really not a nice situation that all
queries are just sitting there and waiting.

Just to clarify: The patch is not only about TOEs. It will report back any
Exception.

Another reason why I'd like this functionality is because it makes
understanding TOEs easier. Think of a developer running his query in CQLSH:
With this patch the user will get a clear message that something is wrong,
instead of a timeout. I know I found this to be confusing in the beginning, and
I probably still do. We could even show the ip address of the host causing the
error in the message. Then the user could see which host is responsible for the
failure.

Is there anything about the patch itself you dont like? Imho its not adding
much complexity. Most of the patch is the new Exception classes and logging.
The actual code handling the failure is just a few lines.

TombstoneOverwhelmingException should not wait for timeout
--

Key: CASSANDRA-7886
URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
Project: Cassandra
Issue Type: Improvement
Components: Core
Environment: Tested with Cassandra 2.0.8
Reporter: Christian Spriegel
Assignee: Christian Spriegel
Priority: Minor
Fix For: 2.1.1

Attachments: 7886_v1.txt

*Issue*
When you have TombstoneOverwhelmingExceptions occuring in queries, this will
cause the query to be simply dropped on every data-node, but no response is
sent back to the coordinator. Instead the coordinator waits for the specified
read_request_timeout_in_ms.
On the application side this can cause memory issues, since the application
is waiting for the timeout interval for every request.Therefore, if our
application runs into TombstoneOverwhelmingExceptions, then (sooner or later)
our entire application cluster goes down :-(
*Proposed solution*
I think the data nodes should send a error message to the coordinator when
they run into a TombstoneOverwhelmingException. Then the coordinator does not
have to wait for the timeout-interval.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-09-22 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143627#comment-14143627
 ] 

Yuki Morishita commented on CASSANDRA-5220:
---

I'm inclined to mark this 'later' in favor of incremental repair and internal 
refactoring such as CASSANDRA-6455.
Especially, incremental repair should decrease the time needed for validating 
data, which is one of the major heavy-liftin processes of repair.


 Repair improvements when using vnodes
 -

 Key: CASSANDRA-5220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0 beta 1
Reporter: Brandon Williams
Assignee: Yuki Morishita
  Labels: performance, repair
 Fix For: 3.0

 Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2


 Currently when using vnodes, repair takes much longer to complete than 
 without them.  This appears at least in part because it's using a session per 
 range and processing them sequentially.  This generates a lot of log spam 
 with vnodes, and while being gentler and lighter on hard disk deployments, 
 ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7766) Secondary index not working after a while

2014-09-22 Thread Shawn Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Kumar updated CASSANDRA-7766:
---
Description: 
cSince 2.1.0-rc2, it appears that the secondary indexes are not always working. 
Immediately after the INSERT of a row, the index seems to be there. But after a 
while (I do not know when or why), SELECT statements based on any secondary 
index do not return the corresponding row(s) anymore. I noticed that a restart 
of C* may have an impact (the data inserted before the restart may be seen 
through the index, even if it was not returned before the restart).

Here is a use-case example (in order to clarify my request) :
{code}
CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text);
CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind);
INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello');
SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while
{code}

The last SELECT statement may or may not return a row depending on the instant 
of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters 
of one and two nodes. Since it depends on the instant of the request, I am not 
able to deliver any way to reproduce that systematically (It appears to be 
linked with some scheduled job inside C*).


  was:
Since 2.1.0-rc2, it appears that the secondary indexes are not always working. 
Immediately after the INSERT of a row, the index seems to be there. But after a 
while (I do not know when or why), SELECT statements based on any secondary 
index do not return the corresponding row(s) anymore. I noticed that a restart 
of C* may have an impact (the data inserted before the restart may be seen 
through the index, even if it was not returned before the restart).

Here is a use-case example (in order to clarify my request) :
{code}
CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text);
CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind);
INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello');
SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while
{code}

The last SELECT statement may or may not return a row depending on the instant 
of the request. I experienced that with 2.1.0-rc5 through CQLSH with clusters 
of one and two nodes. Since it depends on the instant of the request, I am not 
able to deliver any way to reproduce that systematically (It appears to be 
linked with some scheduled job inside C*).



 Secondary index not working after a while
 -

 Key: CASSANDRA-7766
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7766
 Project: Cassandra
  Issue Type: Bug
 Environment: C* 2.1.0-rc5 with small clusters (one or two nodes)
Reporter: Fabrice Larcher
 Attachments: result-failure.txt, result-success.txt


 cSince 2.1.0-rc2, it appears that the secondary indexes are not always 
 working. Immediately after the INSERT of a row, the index seems to be there. 
 But after a while (I do not know when or why), SELECT statements based on any 
 secondary index do not return the corresponding row(s) anymore. I noticed 
 that a restart of C* may have an impact (the data inserted before the restart 
 may be seen through the index, even if it was not returned before the 
 restart).
 Here is a use-case example (in order to clarify my request) :
 {code}
 CREATE TABLE IF NOT EXISTS ks.cf ( k int PRIMARY KEY, ind ascii, value text);
 CREATE INDEX IF NOT EXISTS ks_cf_index ON ks.cf(ind);
 INSERT INTO ks.cf (k, ind, value) VALUES (1, 'toto', 'Hello');
 SELECT * FROM ks.cf WHERE ind = 'toto'; // Returns no result after a while
 {code}
 The last SELECT statement may or may not return a row depending on the 
 instant of the request. I experienced that with 2.1.0-rc5 through CQLSH with 
 clusters of one and two nodes. Since it depends on the instant of the 
 request, I am not able to deliver any way to reproduce that systematically 
 (It appears to be linked with some scheduled job inside C*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7987) Better defaults for CQL tables with clustering keys


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143644#comment-14143644
 ] 

T Jake Luciani commented on CASSANDRA-7987:
---

It could be less pain early on especially if, like you say, they don't really 
need it.  But I think it's a strong assumption that your queries correspond 
with your sstables (range tombstones).   I think it's safer to say if you can't 
predict the size of your partition then you should use LCS.


 Better defaults for CQL tables with clustering keys
 ---

 Key: CASSANDRA-7987
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7987
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Priority: Minor
  Labels: lhf
 Fix For: 3.0


 We currently default to STCS regardless.  If a user creates a table with 
 clustering keys (maybe specifically types with likely high cardinality?)  We 
 should set compaction to LCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7523) add date and time types


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143647#comment-14143647
 ] 

Joshua McKenzie commented on CASSANDRA-7523:


The current implementation retains the same range of dates as the previous, 4 
bytes worth of days with epoch at the center.  Before, the center was 0 on a 
signed integer, and now epoch is shifted by Integer.MIN_VALUE so we can treat 
the data type as unsigned for purposes of comparability.

I'd prefer we have a date range that extends beyond epoch as our minimum 
representable date.

 add date and time types
 ---

 Key: CASSANDRA-7523
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7523
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Jonathan Ellis
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 2.1.1, 3.0


 http://www.postgresql.org/docs/9.1/static/datatype-datetime.html
 (we already have timestamp; interval is out of scope for now, and see 
 CASSANDRA-6350 for discussion on timestamp-with-time-zone.  but date/time 
 should be pretty easy to add.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-7978:
--
Attachment: 7978v2.txt

v2 makes the change live by passing in the live metadata, test added.

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

git commit: Ninja fix debian rules to handle versions with -N package convention

2014-09-22 Thread jake

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 ce357d915 - 192468f7a


Ninja fix debian rules to handle versions with -N package convention


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/192468f7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/192468f7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/192468f7

Branch: refs/heads/cassandra-2.1
Commit: 192468f7a28b312330be213e38a284bbb2bfa890
Parents: ce357d9
Author: Jake Luciani j...@apache.org
Authored: Mon Sep 22 15:32:36 2014 -0400
Committer: Jake Luciani j...@apache.org
Committed: Mon Sep 22 15:32:36 2014 -0400

--
 debian/rules | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/192468f7/debian/rules
--
diff --git a/debian/rules b/debian/rules
index ca303f1..31fd0c0 100755
--- a/debian/rules
+++ b/debian/rules
@@ -6,7 +6,7 @@
 include /usr/share/dpatch/dpatch.make
 
 ANT = /usr/bin/ant
-VERSION = $(shell dpkg-parsechangelog | sed -ne 's/^Version: \(.*\)/\1/p')
+VERSION = $(shell dpkg-parsechangelog | sed -ne 's/^Version: \([^-]*\).*/\1/p')
 
 test:
dh_testdir

[2/2] git commit: Merge branch 'cassandra-2.1' into trunk

2014-09-22 Thread jake

Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3f181f06
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3f181f06
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3f181f06

Branch: refs/heads/trunk
Commit: 3f181f06385fa5ffa8219bae5380fc0767eac55d
Parents: f1bd50c 192468f
Author: Jake Luciani j...@apache.org
Authored: Mon Sep 22 15:33:20 2014 -0400
Committer: Jake Luciani j...@apache.org
Committed: Mon Sep 22 15:33:20 2014 -0400

--
 debian/rules | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--

[1/2] git commit: Ninja fix debian rules to handle versions with -N package convention

2014-09-22 Thread jake

Repository: cassandra
Updated Branches:
  refs/heads/trunk f1bd50ce5 - 3f181f063


Ninja fix debian rules to handle versions with -N package convention


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/192468f7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/192468f7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/192468f7

Branch: refs/heads/trunk
Commit: 192468f7a28b312330be213e38a284bbb2bfa890
Parents: ce357d9
Author: Jake Luciani j...@apache.org
Authored: Mon Sep 22 15:32:36 2014 -0400
Committer: Jake Luciani j...@apache.org
Committed: Mon Sep 22 15:32:36 2014 -0400

--
 debian/rules | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/192468f7/debian/rules
--
diff --git a/debian/rules b/debian/rules
index ca303f1..31fd0c0 100755
--- a/debian/rules
+++ b/debian/rules
@@ -6,7 +6,7 @@
 include /usr/share/dpatch/dpatch.make
 
 ANT = /usr/bin/ant
-VERSION = $(shell dpkg-parsechangelog | sed -ne 's/^Version: \(.*\)/\1/p')
+VERSION = $(shell dpkg-parsechangelog | sed -ne 's/^Version: \([^-]*\).*/\1/p')
 
 test:
dh_testdir

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143741#comment-14143741
 ] 

Jason Brown commented on CASSANDRA-7978:


Reviewed v2, and I wonder if it's better to update the CP.crcCheckChance as 
it's a non-final field (and volatile, at that) rather than retain a reference 
to the CFMD. 


 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143747#comment-14143747
 ] 

T Jake Luciani commented on CASSANDRA-7978:
---

When you merge the CFMD it replaces the entire CP so it needed to be higher 
level then CP.

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7130) Make sstable checksum type configurable and optional


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143759#comment-14143759
 ] 

T Jake Luciani commented on CASSANDRA-7130:
---

Is the plan to put this in 2.1 over CASSANDRA-7928?  the latter seems pretty 
messy

 Make sstable checksum type configurable and optional
 

 Key: CASSANDRA-7130
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7130
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Jason Brown
Priority: Minor
  Labels: performance
 Fix For: 3.0


 A lot of our users are becoming bottlenecked on CPU rather than IO, and 
 whilst Adler32 is faster than CRC, it isn't anything like as fast as xxhash 
 (used by LZ4), which can push Gb/s. I propose making the checksum type 
 configurable so that users who want speed can shift to xxhash, and those who 
 want security can use Adler or CRC.
 It's worth noting that at some point in the future (JDK8?) optimised 
 implementations using latest intel crc instructions will be added, though 
 it's not clear from the mailing list discussion if/when that will materialise:
 http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-May/010775.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143781#comment-14143781
 ] 

Jason Brown commented on CASSANDRA-7978:


I just found this ticket, #CASSANDRA-5053, which exposes 
CFS.setCrcCheckChance() as an mbean value (configurable at runtime). Would that 
be sufficient for [~kohlisankalp]'s needs, of being able to adjust the CRC 
check chance dynamically (without restart/upgradesstables/etc)?

 

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7878) Fix wrong progress reporting when streaming uncompressed SSTable w/ CRC check


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143804#comment-14143804
 ] 

Joshua McKenzie commented on CASSANDRA-7878:


Did you also sneak in a fix to the # of bytes we're requesting from the rate 
limiter?

{code}
limiter.acquire(toTransfer - start);
{code}

Either way: +1.  LGTM.

 Fix wrong progress reporting when streaming uncompressed SSTable w/ CRC check
 -

 Key: CASSANDRA-7878
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7878
 Project: Cassandra
  Issue Type: Bug
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Trivial
 Fix For: 2.0.11

 Attachments: 0001-Fix-wrong-progress-when-streaming-uncompressed.patch


 Streaming uncompressed SSTable w/ CRC validation calculates progress wrong. 
 It shows transfer bytes as the sum of all read bytes for CRC validation. So 
 netstats shows progress way over 100%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143802#comment-14143802
 ] 

T Jake Luciani commented on CASSANDRA-7978:
---

I thought doing things on 1000's of nodes was a pain :) also that is an 
ephemeral change.

The v2 patch doesn't require any restart/upgradesstables/etc, it updates as 
soon as the change propagates.

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7986) The Pig tests cannot run on Cygwin on Windows


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-7986:
---
Reviewer: Joshua McKenzie

 The Pig tests cannot run on Cygwin on Windows
 -

 Key: CASSANDRA-7986
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7986
 Project: Cassandra
  Issue Type: Bug
 Environment: Windows 8.1, Cygwin 1.7.32 
Reporter: Benjamin Lerer
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 3.0

 Attachments: CASSANDRA-7986.txt


 When running the Pig-Tests on Cygwin Windows I run into 
 https://issues.apache.org/jira/browse/HADOOP-7682. 
 Ideally this issue should be properly fix in HADOOP but as the issue is open 
 since September 2011 it will be good if we implemented the workaround 
 mentionned by Joshua Caplan for the Pig-Tests 
 (https://issues.apache.org/jira/browse/HADOOP-7682?focusedCommentId=13440120page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13440120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7988) 2.1 broke cqlsh for IPv6

2014-09-22 Thread Josh Wright (JIRA)

Josh Wright created CASSANDRA-7988:
--

 Summary: 2.1 broke cqlsh for IPv6 
 Key: CASSANDRA-7988
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7988
 Project: Cassandra
  Issue Type: New Feature
Reporter: Josh Wright


cqlsh in 2.1 switched to the cassandra-driver Python library, which only 
recently added IPv6 support. The version bundled with 2.1.0 does not include a 
sufficiently recent version, so cqlsh is unusable for those of us running IPv6 
(us? me...?)

The fix is to simply upgrade the bundled version of the Python cassandra-driver 
to at least version 2.1.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143893#comment-14143893
 ] 

Jason Brown commented on CASSANDRA-7978:


bq. that is an ephemeral change

d'oh! yes, you are correct.

Then, I'm +1 on the patch. The only nit I would have (and it's rather trivial) 
is to perhaps store a reference to an override CompressionParameters rather 
than the whole CFMD (as live endpoints) in the updated CompressionParameters. 

[~benedict] the reason why FileCacheService.invalidate() wouldn't work is that 
FCS holds RAR/CRAR instances, and CRAR gets it's CompressionMetadata from a 
CompressedPoolingSegmentedFile wrapper (derived from SegmentedFile), and CPSF 
what SSTR holds onto for it's lifetime. Thus, CPSF never gets 'updated' or 
recycled (via FCS) in the way that CRAR does, so calling 
FileCacheService.invalidate() won't do the trick.

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7130) Make sstable checksum type configurable and optional


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143910#comment-14143910
 ] 

Jason Brown commented on CASSANDRA-7130:


[~tjake] Yes. And agreed about 7928 being ugly.

 Make sstable checksum type configurable and optional
 

 Key: CASSANDRA-7130
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7130
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Jason Brown
Priority: Minor
  Labels: performance
 Fix For: 3.0


 A lot of our users are becoming bottlenecked on CPU rather than IO, and 
 whilst Adler32 is faster than CRC, it isn't anything like as fast as xxhash 
 (used by LZ4), which can push Gb/s. I propose making the checksum type 
 configurable so that users who want speed can shift to xxhash, and those who 
 want security can use Adler or CRC.
 It's worth noting that at some point in the future (JDK8?) optimised 
 implementations using latest intel crc instructions will be added, though 
 it's not clear from the mailing list discussion if/when that will materialise:
 http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-May/010775.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7406) Reset version when closing incoming socket in IncomingTcpConnection should be done atomically

2014-09-22 Thread Shawn Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143972#comment-14143972
 ] 

Shawn Kumar commented on CASSANDRA-7406:


Looks like this particular issue was also brought up and is being looked at in 
7734. It would be greatly appreciated if you could note the version you noticed 
this in. 

 Reset version when closing incoming socket in IncomingTcpConnection should be 
 done atomically
 -

 Key: CASSANDRA-7406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7406
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 5.5 (Tikanga)
Reporter: Ray Chen

 When closing incoming socket, the close() method will call 
 MessagingService.resetVersion(), this behavior may clear version which is set 
 by another thread.  
 This could cause MessagingService.knowsVersion(endpoint) test results as 
 false (expect true here).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes

2014-09-22 Thread Gregory Burd (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144164#comment-14144164
 ] 

Gregory Burd commented on CASSANDRA-7075:
-

The paper Aether: A Scalable Approach to Logging 
(http://pandis.net/resources/vldb10aether.pdf) has a great many insights into 
how and when an ARIES/WAL can be optimized.  I know that's a bit different from 
the commitlog/ss-table/memtable used in Cassandra, but there are many ideas 
which overlap and might carry over or at least inspire you.

 Add the ability to automatically distribute your commitlogs across all data 
 volumes
 ---

 Key: CASSANDRA-7075
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7075
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Tupshin Harper
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 given the prevalance of ssds (no need to separate commitlog and data), and 
 improved jbod support, along with CASSANDRA-3578, it seems like we should 
 have an option to have one commitlog per data volume, to even the load. i've 
 been seeing more and more cases where there isn't an obvious extra volume 
 to put the commitlog on, and sticking it on only one of the jbodded ssd 
 volumes leads to IO imbalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144179#comment-14144179
 ] 

Jonathan Ellis commented on CASSANDRA-7075:
---

Thanks, Gregory!

 Add the ability to automatically distribute your commitlogs across all data 
 volumes
 ---

 Key: CASSANDRA-7075
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7075
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Tupshin Harper
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 given the prevalance of ssds (no need to separate commitlog and data), and 
 improved jbod support, along with CASSANDRA-3578, it seems like we should 
 have an option to have one commitlog per data volume, to even the load. i've 
 been seeing more and more cases where there isn't an obvious extra volume 
 to put the commitlog on, and sticking it on only one of the jbodded ssd 
 volumes leads to IO imbalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144196#comment-14144196
 ] 

T Jake Luciani commented on CASSANDRA-7978:
---

I would need to create a wrapper class to hold the compression parameters 
otherwise the compressed reader won't see the live updates. That is exactly 
what CFMD is doing here. I don't see a reason to not use it.

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7969) Properly track min/max timestamps and maxLocalDeletionTimes for range and row tombstones


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144208#comment-14144208
 ] 

T Jake Luciani commented on CASSANDRA-7969:
---

Running the tests I see the following failures: 

{code}
  [junit] 
[junit] Testcase: 
testTrackTimesRowTombstoneWithData(org.apache.cassandra.db.RangeTombstoneTest): 
  FAILED
[junit] expected:999 but was:2
[junit] junit.framework.AssertionFailedError: expected:999 but was:2
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.assertTimes(RangeTombstoneTest.java:200)
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.testTrackTimesRowTombstoneWithData(RangeTombstoneTest.java:150)
[junit] 
[junit] 
[junit] Testcase: 
testTrackTimesRangeTombstone(org.apache.cassandra.db.RangeTombstoneTest): FAILED
[junit] expected:1000 but was:2
[junit] junit.framework.AssertionFailedError: expected:1000 but was:2
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.assertTimes(RangeTombstoneTest.java:200)
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.testTrackTimesRangeTombstone(RangeTombstoneTest.java:168)
[junit] 
[junit] 
[junit] Testcase: 
testTrackTimesRangeTombstoneWithData(org.apache.cassandra.db.RangeTombstoneTest):
 FAILED
[junit] expected:999 but was:0
[junit] junit.framework.AssertionFailedError: expected:999 but was:0
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.assertTimes(RangeTombstoneTest.java:200)
[junit] at 
org.apache.cassandra.db.RangeTombstoneTest.testTrackTimesRangeTombstoneWithData(RangeTombstoneTest.java:191)
[junit] 

{code}

 Properly track min/max timestamps and maxLocalDeletionTimes for range and row 
 tombstones
 

 Key: CASSANDRA-7969
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7969
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.0.11

 Attachments: 
 0001-track-min-max-timestamps-and-maxLocalDeletionTime-co.patch


 First problem is that when we have only row or range tombstones in an sstable 
 we dont update the maxLocalDeletionTime for the sstable
 Second problem is that if we have a range tombstone in an sstable, 
 minTimestamp will always be Long.MIN_VALUE for flushed sstables due to how we 
 set the default values for the variables



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7939) checkForEndpointCollision should ignore joining nodes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144236#comment-14144236
 ] 

T Jake Luciani commented on CASSANDRA-7939:
---

Based on the description I'm not clear why the patch is to check if it's not a 
fatClient?

nit: in the 2.1 patch you should use RangeStreamer.useStrictConsistency vs 
re-parsing the property

 checkForEndpointCollision should ignore joining nodes
 -

 Key: CASSANDRA-7939
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7939
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 2.0.11, 2.1.1

 Attachments: 7939-2.1.txt, 7939.txt


 If you fail a bootstrap, then immediately retry it, cFEC erroneously tells 
 you to replace it:
 {noformat}
 ERROR 00:04:50 Exception encountered during startup
 java.lang.RuntimeException: A node with address bw-3/10.208.8.63 already 
 exists, cancelling join. Use cassandra.replace_address if you want to replace 
 this node.
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:453)
  ~[main/:na]
 at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:666)
  ~[main/:na]
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:614)
  ~[main/:na]
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:507)
  ~[main/:na]
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) 
 [main/:na]
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457)
  [main/:na]
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) 
 [main/:na]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7406) Reset version when closing incoming socket in IncomingTcpConnection should be done atomically

2014-09-22 Thread Ray Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144238#comment-14144238
 ] 

Ray Chen edited comment on CASSANDRA-7406 at 9/23/14 2:56 AM:
--

The related version of Cassandra is 2.0.6


was (Author: oldsharp):
[~shawn.kumar] The related version of Cassandra is 2.0.6

 Reset version when closing incoming socket in IncomingTcpConnection should be 
 done atomically
 -

 Key: CASSANDRA-7406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7406
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 5.5 (Tikanga)
Reporter: Ray Chen

 When closing incoming socket, the close() method will call 
 MessagingService.resetVersion(), this behavior may clear version which is set 
 by another thread.  
 This could cause MessagingService.knowsVersion(endpoint) test results as 
 false (expect true here).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7406) Reset version when closing incoming socket in IncomingTcpConnection should be done atomically

2014-09-22 Thread Ray Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144238#comment-14144238
 ] 

Ray Chen commented on CASSANDRA-7406:
-

[~shawn.kumar] The related version of Cassandra is 2.0.6

 Reset version when closing incoming socket in IncomingTcpConnection should be 
 done atomically
 -

 Key: CASSANDRA-7406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7406
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 5.5 (Tikanga)
Reporter: Ray Chen

 When closing incoming socket, the close() method will call 
 MessagingService.resetVersion(), this behavior may clear version which is set 
 by another thread.  
 This could cause MessagingService.knowsVersion(endpoint) test results as 
 false (expect true here).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7988) 2.1 broke cqlsh for IPv6

2014-09-22 Thread Mikhail Stepura (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Stepura updated CASSANDRA-7988:
---
Fix Version/s: 2.1.1

 2.1 broke cqlsh for IPv6 
 -

 Key: CASSANDRA-7988
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7988
 Project: Cassandra
  Issue Type: New Feature
Reporter: Josh Wright
 Fix For: 2.1.1


 cqlsh in 2.1 switched to the cassandra-driver Python library, which only 
 recently added IPv6 support. The version bundled with 2.1.0 does not include 
 a sufficiently recent version, so cqlsh is unusable for those of us running 
 IPv6 (us? me...?)
 The fix is to simply upgrade the bundled version of the Python 
 cassandra-driver to at least version 2.1.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-7989) nodetool repair goes to infinite repair, continue beyond the number of tokens handled in a single repair.

2014-09-22 Thread Andrew Johnson (JIRA)

Andrew Johnson created CASSANDRA-7989:
-

 Summary: nodetool repair goes to infinite repair, continue 
beyond the number of tokens handled in a single repair. 
 Key: CASSANDRA-7989
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7989
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Andrew Johnson
Priority: Minor
 Fix For: 2.0.9


DSE 4.0.3 with patch (Cassandra 2.0.9.61)

Percentage reported stays in 99.999%

We are computing % complete by

[ (Current token processed - initial token processed) / (# of token handled by 
this node) ] * 100 = xx.xx %

In this case, it goes beyond 100% meaning numerator(repaired number of tokens 
in this session) is greater than the number of tokens handled by this node 
(5624 in this node), we caught this and report 99.999%

AntiEntropySession increments and there are no visible errors nor exceptions in 
the log once it was stabilized, also a sub process neither terminated nor 
finished. 
(Note, when this session started, there were many exceptions - snapshot 
creation - causing a sub process to be terminated and restarted about 5 times 
within a hour but once it is stabilized, it kept going since Aug 22.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-09-22 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144260#comment-14144260
 ] 

Vijay commented on CASSANDRA-7438:
--

Hi [~rst...@pironet-ndh.com], I dont see a problem in copying the code or 
rewriting the code, once you complete the rest of the review we can see what we 
can do. I am guessing you where not waiting for my response :) Thanks!

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144324#comment-14144324
 ] 

Jason Brown commented on CASSANDRA-7978:


I was slightly concerned about the unexpected appearance of the CFMD that's 
only used in one place, and hoped we could maybe get around that (as well as 
reducing the scope of CP's dependencies). You are correct that we need some 
wrapper to get those updates, and I just now  spent time trying to see if it's 
just easier to pass the CFMD instance into the CP constructor (and always have 
a reference to that CFMD, rather than waiting for 
SSTR.getCompressionMetadata()). Unfortunately, it's not simple as CP has 4 
constructors, called from different paths, and being able to pass the correct 
CFMD instance to those is not overly friendly. So, I'll stop making noise now 
and let you commit the patch so we can move on :).

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7978) CrcCheckChance should not be stored as part of an sstable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144324#comment-14144324
 ] 

Jason Brown edited comment on CASSANDRA-7978 at 9/23/14 4:44 AM:
-

I was slightly concerned about the unexpected appearance of the CFMD that's 
only used in one place, and hoped we could maybe get around that (as well as 
reducing the scope of CP's dependencies). You are correct that we need some 
wrapper to get those updates, and I just now  spent time trying to see if it's 
just easier to pass the CFMD instance into the CP constructor (and always have 
a reference to that CFMD, rather than waiting for SSTR.getCompressionMetadata() 
to pass it in). Unfortunately, it's not simple as CP has 4 constructors, called 
from different paths, and being able to pass the correct CFMD instance to those 
is not overly friendly. So, I'll stop making noise now and let you commit the 
patch so we can move on :).


was (Author: jasobrown):
I was slightly concerned about the unexpected appearance of the CFMD that's 
only used in one place, and hoped we could maybe get around that (as well as 
reducing the scope of CP's dependencies). You are correct that we need some 
wrapper to get those updates, and I just now  spent time trying to see if it's 
just easier to pass the CFMD instance into the CP constructor (and always have 
a reference to that CFMD, rather than waiting for 
SSTR.getCompressionMetadata()). Unfortunately, it's not simple as CP has 4 
constructors, called from different paths, and being able to pass the correct 
CFMD instance to those is not overly friendly. So, I'll stop making noise now 
and let you commit the patch so we can move on :).

 CrcCheckChance should not be stored as part of an sstable
 -

 Key: CASSANDRA-7978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7978
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.0.11

 Attachments: 7978.txt, 7978v2.txt


 CrcCheckChance is stored with compression parameters in the sstable. The only 
 way to change it is to do upgrade sstable. I don't see why it should not be a 
 hot property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7969) Properly track min/max timestamps and maxLocalDeletionTimes for range and row tombstones