Sorry, bad bad typo…..300G is what I meant.

Cassandra heavily advises to stay under 1T per node or you run into big 
troubles and most people stay under 500G per node.

Later,
Dean

From: Jayadev Jayaraman <jdisal...@gmail.com<mailto:jdisal...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, September 18, 2013 1:30 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What is the ideal value for sstable_size_in_mb when using 
LeveledCompactionStrategy ?

Thanks for the quick reply. We've already upped the ulimit as high as our Linux 
distro allows us to ( around 1.8 million  ).

I have a follow-up question. I see that the size of individual nodes in your 
use case is quite massive. Does the safe number vary widely based on 
differences in underlying hardware, or would you say from experience that 
something around 50M for medium to large datasets ( with upped file-descriptor 
limits ) is safe for most medium-sized (1 - 5 TB per node) to high-end 
(hundreds of TB) hardware ?


On Wed, Sep 18, 2013 at 3:15 PM, Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
 1.  Always in cassandra up your file descriptor limits on linux and even in 
0.7 that was the recommendation so cassandra could open tons of files
 2.  We use 50M for our LCS with no performance issues.  We had it 10M on our 
previous with no issues but a huge amount of files of course with our 300T per 
node.

Dean

From: Jayadev Jayaraman 
<jdisal...@gmail.com<mailto:jdisal...@gmail.com><mailto:jdisal...@gmail.com<mailto:jdisal...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Wednesday, September 18, 2013 1:02 PM
To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: What is the ideal value for sstable_size_in_mb when using 
LeveledCompactionStrategy ?

We have set up a 24 node (m1.xlarge nodes, 1.7 TB per node) cassandra cluster 
on Amazon EC2 :

version=1.2.9
replication factor = 2
snitch=EC2Snitch
placement_strategy=NetworkTopologyStrategy (with 12 nodes each in 2 
availability zones)

Background on our use-case :

We plan on using hadoop with sstableloader to load 10GB+ of analytics data per 
day ( 100 million+ row keys, 5 or so columns per day on average.) . We have 
chosen LeveledCompactionStrategy in the hope that it constrains the number of 
SSTables that are read in order to retrieve a sliced-predicate for a row. We 
don't want too many file-sockets ( > 1000) open to SSTables by the Cassandra 
JVM as this has caused us network / unreachability issues before. We faced this 
when we were on cassandra 0.8.9 and we were using SizeTieredCompactionStrategy 
and in order to mitigate this, we ran minor compaction daily and major 
compaction semi-regularly to ensure as few SSTable files as possible on disk.





If we use LeveledCompactionStrategy with a small value for sstable_size_in_mb ( 
default = 5 MB ) , wouldn't that result in a very large number of SSTable files 
on disk ? How does that affect the number of file-sockets open (reading the 
docs, I get the impression that the number of SSTable seeks per query is 
reduced by a large margin) ? But if we use a larger value for 
sstable_size_in_mb, say around 200 MB, there will be 800 MB of small 
uncompacted SSTables on disk per column-family to which there will inevitably 
be file-sockets open.

All in all, can someone help us figure out what we should set the value of 
sstable_size_in_mb to ? I figure it's not a very good idea to set it to a 
larger value but I don't know how things perform if we set it to a small value. 
Do we have to run major compaction regularly in this case too ?

Thanks
Jayadev



Reply via email to