Re: Would warnings about overlapping SStables explain high pending compactions?

2014-09-25 Thread Marcus Eriksson
Not really

What version are you on? Do you have pending compactions and no ongoing
compactions?

/Marcus

On Wed, Sep 24, 2014 at 11:35 PM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  On one of our nodes we have lots of pending compactions (499).In the
 past we’ve seen pending compactions go up to 2400 and all the way back down
 again.



 Investigating, I saw warnings such as the following in the logs about
 overlapping SStables and about needing to run “nodetool scrub” on a table.
 Would the overlapping SStables explain the pending compactions?



 WARN [RMI TCP Connection(2)-10.5.50.30] 2014-09-24 09:14:11,207
 LeveledManifest.java (line 154) At level 1,
 SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC-jb-388233-Data.db')
 [DecoratedKey(-6112875836465333229,
 3366636664393031646263356234663832383264616561666430383739383738),
 DecoratedKey(-4509284829153070912,
 3366336562386339376664376633353635333432636662373739626465393636)]
 overlaps
 SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC_blob-jb-388150-Data.db')
 [DecoratedKey(-4834684725563291584,
 336633623334363664363632666365303664333936336337343566373838),
 DecoratedKey(-4136919579566299218,
 3366613535646662343235336335633862666530316164323232643765323934)].  This
 could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact
 that you have dropped sstables from another node into the data directory.
 Sending back to L0.  If you didn't drop in sstables, and have not yet run
 scrub, you should do so since you may also have rows out-of-order within an
 sstable



 Thanks



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]





Re: node keeps dying

2014-09-25 Thread Vivek Mishra
Increase heap size with Cassandra and try
On 25/09/2014 3:02 am, Prem Yadav ipremya...@gmail.com wrote:

 BTW, thanks Michael.
 I am surprised why I didn't search for Cassandra oom before.
 I got some good links that discuss that. Will try to optimize and see how
it goes.


 On Wed, Sep 24, 2014 at 10:27 PM, Prem Yadav ipremya...@gmail.com wrote:

 Well its not the Linux OOM killer. The system is running with all
default settings.

 Total memory 7GB- Cassandra gets assigned 2GB
 2 core processors.
 Two rings with 3 nodes in each ring.

 On Wed, Sep 24, 2014 at 9:53 PM, Michael Shuler mich...@pbandjelly.org
wrote:

 On 09/24/2014 11:32 AM, Prem Yadav wrote:

 this is an issue that has happened a few times. We are using DSE 4.0


 I believe this is Apache Cassandra 2.0.5, which is better info for this
list.

 One of the Cassandra nodes is detected as dead by the opscenter even
 though I can see the process is up.

 the logs show heap space error:

   INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24
08:31:05,340
 StorageService.java (line 2538) Starting repair command #30766,
 repairing 1 ranges for keyspace keyspace
 ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
 (line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
 java.lang.OutOfMemoryError: Java heap space
  at java.util.ArrayList.init(Unknown Source)


 OOM.

 System environment and configuration modification details might be
helpful for others to give you advice. Searching for cassandra oom gave
me a few good links to read, and knowing some details about your nodes
might be really helpful. Additionally, CASSANDRA-7507 [0] suggests that an
OOM leaving the process running in an unclean state is not desired, and the
process should be killed.

 Several of the search links provide details on how to capture and dig
around a heap dump to aid in troubleshooting.

 [0] https://issues.apache.org/jira/browse/CASSANDRA-7507
 --
 Kind regards,
 Michael





Difference in retrieving data from cassandra

2014-09-25 Thread Umang Shah
Hi All,

I am using cassandra with Pentaho PDI kettle, i have installed cassandra in
Amazon EC2 instance and in local-machine, so when i am trying to retrieve
data from local machine using Pentaho PDI it is taking few seconds (not
more then 20 seconds) and if i do the same using production data-base it
takes almost 3 minutes for the same number of data , which is huge
difference.

So if anybody can give me some comments of solution that what i need to
check for this or how can i narrow down this difference?

on local machine and production server RAM is same.
Local machine is windows environment and production is Linux.

-- 
Regards,
Umang V.Shah
BI-ETL Developer


Re: using dynamic cell names in CQL 3

2014-09-25 Thread shahab
Thanks,
It seems that I was not clear in my question, I would like to store values
in the column name, for example column.name would be event_name
(temperature) and column-content would be the respective value (e.g.
40.5) . And I need to know how the schema should look like in CQL 3

best,
/Shahab


On Wed, Sep 24, 2014 at 1:49 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Dynamic thing in Thrift ≈ clustering columns in CQL

 Can you give more details about your data model ?

 On Wed, Sep 24, 2014 at 1:11 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I  would like to define schema for a table where the column (cell) names
 are defined dynamically. Apparently there is a way to do this in Thrift (
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
 )

 but i couldn't find how i can do the same using CQL?

 Any resource/example that I can look at ?


 best,
 /Shahab





Re: Difference in retrieving data from cassandra

2014-09-25 Thread Jonathan Haddad
You'll need to provide a bit of information.  To start, a query trace
from would be helpful.

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

(self promo) You may want to read over my blog post regarding
diagnosing problems in production.  I've covered diagnosing slow
queries: 
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/


On Thu, Sep 25, 2014 at 4:21 AM, Umang Shah shahuma...@gmail.com wrote:
 Hi All,

 I am using cassandra with Pentaho PDI kettle, i have installed cassandra in
 Amazon EC2 instance and in local-machine, so when i am trying to retrieve
 data from local machine using Pentaho PDI it is taking few seconds (not more
 then 20 seconds) and if i do the same using production data-base it takes
 almost 3 minutes for the same number of data , which is huge difference.

 So if anybody can give me some comments of solution that what i need to
 check for this or how can i narrow down this difference?

 on local machine and production server RAM is same.
 Local machine is windows environment and production is Linux.

 --
 Regards,
 Umang V.Shah
 BI-ETL Developer



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Would warnings about overlapping SStables explain high pending compactions?

2014-09-25 Thread Donald Smith
Version 2.0.9.   We have 11 ongoing compactions on that node.

From: Marcus Eriksson [mailto:krum...@gmail.com]
Sent: Thursday, September 25, 2014 12:45 AM
To: user@cassandra.apache.org
Subject: Re: Would warnings about overlapping SStables explain high pending 
compactions?

Not really

What version are you on? Do you have pending compactions and no ongoing 
compactions?

/Marcus

On Wed, Sep 24, 2014 at 11:35 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
On one of our nodes we have lots of pending compactions (499).In the past 
we’ve seen pending compactions go up to 2400 and all the way back down again.

Investigating, I saw warnings such as the following in the logs about 
overlapping SStables and about needing to run “nodetool scrub” on a table.  
Would the overlapping SStables explain the pending compactions?

WARN [RMI TCP Connection(2)-10.5.50.30] 2014-09-24 09:14:11,207 
LeveledManifest.java (line 154) At level 1, 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC-jb-388233-Data.db') 
[DecoratedKey(-6112875836465333229, 
3366636664393031646263356234663832383264616561666430383739383738), 
DecoratedKey(-4509284829153070912, 
3366336562386339376664376633353635333432636662373739626465393636)] overlaps 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC_blob-jb-388150-Data.db') 
[DecoratedKey(-4834684725563291584, 
336633623334363664363632666365303664333936336337343566373838), 
DecoratedKey(-4136919579566299218, 
3366613535646662343235336335633862666530316164323232643765323934)].  This could 
be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have 
dropped sstables from another node into the data directory. Sending back to L0. 
 If you didn't drop in sstables, and have not yet run scrub, you should do so 
since you may also have rows out-of-order within an sstable

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]




Experience with multihoming cassandra?

2014-09-25 Thread Donald Smith
We have large boxes with 256G of RAM and SSDs.  From iostat, top, and sar 
we think the system has excess capacity.  Anyone have recommendations about 
multihominghttp://en.wikipedia.org/wiki/Multihoming cassandra on such a node 
(connecting it to multiple IPs and running multiple cassandras simultaneously)? 
 I'm skeptical, since Cassandra already has built-in multi-threading and 
since if the node went down multiple nodes would disappear.  We're using C* 
version 2.0.9.

A google/bing search for  multihoming cassandra doesn't turn much up.

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Re: Experience with multihoming cassandra?

2014-09-25 Thread Jared Biel
Doing this seems counter-productive to Cassandra's design/use-cases. It's
best at home running on a large number of smaller servers rather than a
small number of large servers. Also, as you said, you won't get any of the
high availability benefits that it offers if you run multiple copies of
Cassandra on the same box.


On 25 September 2014 16:58, Donald Smith donald.sm...@audiencescience.com
wrote:

  We have large boxes with 256G of RAM and SSDs.  From iostat, top,
 and sar we think the system has excess capacity.  Anyone have
 recommendations about multihoming
 http://en.wikipedia.org/wiki/Multihoming cassandra on such a node
 (connecting it to multiple IPs and running multiple cassandras
 simultaneously)?  I’m skeptical, since Cassandra already has built-in
 multi-threading and since if the node went down multiple nodes would
 disappear.  We’re using C* version 2.0.9.



 A google/bing search for  multihoming cassandra doesn’t turn much up.



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]





Re: Adjusting readahead for SSD disk seeks

2014-09-25 Thread Kevin Burton
I’d advise keeping read ahead low… or turning it off on SSD.  Also, noop IO
scheduler might help you on that disk..

IF Cassandra DOES perform a contiguous read, read ahead won’t be helpful.

It’s essentially obsolete now on SSDs.

On Wed, Sep 24, 2014 at 1:20 PM, Daniel Chia danc...@coursera.org wrote:

 Cassandra only reads a small part of each SSTable during normal operation
 (not compaction), in fact Datastax recommends lowering readahead -
 http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/installRecommendSettings.html

 There are also blogposts where people have improved their read latency
 reducing ra.

 Thanks,
 Daniel

 On Wed, Sep 24, 2014 at 4:15 PM, DuyHai Doan doanduy...@gmail.com wrote:

 does it typically have to read in the entire SStable into memory
 (assuming the bloom filter said yes)? -- No, it would be perf killer.

  On the read path, after Bloom filter, Cassandra is using the Partition
 Key Cache to see if the partition it is looking for is present there.

  If yes, it gets the offset (from the beginning of the SSTable) to skip a
 lot of data and move the disk head directly there
  If not, it then relies on the Partition sample to move the disk head
 to the nearest location of the sought partition

  If compaction is on (by default), there will be another step before
 hitting disk: compression offset. It's a translation table to match
 uncompressed file offset / compressed file offset


 On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith 
 donald.sm...@audiencescience.com wrote:

  We’re using cassandra as a key-value store; our values are small.  So
 we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra
 /dev/sda”).   We’re using SSDs.



 When cassandra does disk seeks to satisfy read requests does it
 typically have to read in the entire SStable into memory (assuming the
 bloom filter said yes)?  If cassandra needs to read in lots of blocks
 anyway or if it needs to read the entire file during compaction then I'd
 expect we might as well have a big readahead.   Perhaps there’s a tradeoff
 between read latency and compaction time.



 Any feedback welcome.


 Thanks



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]








-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com