Re: any ways to have compaction use less disk space?

2012-09-26 Thread Sylvain Lebresne
On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli rc...@palominodb.com wrote:
 150,000 sstables seem highly unlikely to be performant. As a simple
 example of why, on the read path the bloom filter for every sstable
 must be consulted...

Unfortunately that's a bad example since that's not true.

Leveled compaction keeps sstables in level of non-overlapping key
ranges. Meaning that a read only has to check one sstable by level (a
little bit more to be precise since it has to include all of Level 0,
but provided you node is not lacking too much behind, that's still a
small amount of sstables). I'm too lazy to do the exact maths but I
believe that for 700gb you'll have 8 levels.

--
Sylvain


Re: any ways to have compaction use less disk space?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 6:05 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli rc...@palominodb.com wrote:
 150,000 sstables seem highly unlikely to be performant. As a simple
 example of why, on the read path the bloom filter for every sstable
 must be consulted...

 Unfortunately that's a bad example since that's not true.

You learn something new every day. Thanks for the clarification.

I reduce my claim to a huge number of SSTables are unlikely to be
performant. :)

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: any ways to have compaction use less disk space?

2012-09-25 Thread Віталій Тимчишин
See my comments inline

2012/9/25 Aaron Turner synfina...@gmail.com

 On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  Why so?
  What are pluses and minuses?
  As for me, I am looking for number of files in directory.
  700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
  700GB/5MB*5 = 70 files, that is too much for single directory, too
 much
  memory used for SST data, too huge compaction queue (that leads to
 strange
  pauses, I suppose because of compactor thinking what to compact next),...


 Not sure why a lot of files is a problem... modern filesystems deal
 with that pretty well.


May be. May be it's not filesystem, but cassandra. I've seen slowdowns of
compaction when the compaction queue is too large. And it can be too large
if you have a lot of SSTables. Note that each SSTable is both FS metadata
(and FS metadata cache can be limited) and cassandra in-memory data.
Anyway, as for me, performance test would be great in this area. Otherwise
it's all speculations.



 Really large sstables mean that compactions now are taking a lot more
 disk IO and time to complete.


As for me, this point is valid only when your flushes are small. Otherwise
you still need to compact the whole key range flush cover, no matter if
this is one large file or multiple small ones. One large file can even be
cheapier to compact.


 Remember, Leveled Compaction is more
 disk IO intensive, so using large sstables makes that even worse.
 This is a big reason why the default is 5MB. Also, each level is 10x
 the size as the previous level.  Also, for level compaction, you need
 10x the sstable size worth of free space to do compactions.  So now
 you need 5GB of free disk, vs 50MB of free disk.


I really don't think 5GB of free space is too much :)



 Also, if you're doing deletes in those CF's, that old, deleted data is
 going to stick around a LOT longer with 512MB files, because it can't
 get deleted until you have 10x512MB files to compact to level 2.
 Heaven forbid it doesn't get deleted then because each level is 10x
 bigger so you end up waiting a LOT longer to actually delete that data
 from disk.


But if I have small SSTables, all my data goes to high levels (4th for me
when I've had 128M setting). And it also take time for updates to reach
this level. I am not sure which way is faster.



 Now, if you're using SSD's then larger sstables is probably doable,
 but even then I'd guesstimate 50MB is far more reasonable then 512MB.


I don't think SSD are great for writes/compaction. Cassandra does this in
streaming fashion and regular HDDs are faster then SSDs for linear
read/write. SSD are good for random access, that for cassandra means reads.

P.S. I still think my way is better, yet it would be great to perform some
real tests.


 -Aaron


  2012/9/23 Aaron Turner synfina...@gmail.com
 
  On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
  wrote:
   If you think about space, use Leveled compaction! This won't only
 allow
   you
   to fill more space, but also will shrink you data much faster in case
 of
   updates. Size compaction can give you 3x-4x more space used than there
   are
   live data. Consider the following (our simplified) scenario:
   1) The data is updated weekly
   2) Each week a large SSTable is written (say, 300GB) after full update
   processing.
   3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
   4) Only after 4th week they all will be compacted into one 300GB
   SSTable.
  
   Leveled compaction've tamed space for us. Note that you should set
   sstable_size_in_mb to reasonably high value (it is 512 for us with
   ~700GB
   per node) to prevent creating a lot of small files.
 
  512MB per sstable?  Wow, that's freaking huge.  From my conversations
  with various developers 5-10MB seems far more reasonable.   I guess it
  really depends on your usage patterns, but that seems excessive to me-
  especially as sstables are promoted.
 
 
  --
  Best regards,
   Vitalii Tymchyshyn



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero




-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-25 Thread Aaron Turner
On Tue, Sep 25, 2012 at 10:36 AM, Віталій Тимчишин tiv...@gmail.com wrote:
 See my comments inline

 2012/9/25 Aaron Turner synfina...@gmail.com

 On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  Why so?
  What are pluses and minuses?
  As for me, I am looking for number of files in directory.
  700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
  700GB/5MB*5 = 70 files, that is too much for single directory, too
  much
  memory used for SST data, too huge compaction queue (that leads to
  strange
  pauses, I suppose because of compactor thinking what to compact
  next),...


 Not sure why a lot of files is a problem... modern filesystems deal
 with that pretty well.


 May be. May be it's not filesystem, but cassandra. I've seen slowdowns of
 compaction when the compaction queue is too large. And it can be too large
 if you have a lot of SSTables. Note that each SSTable is both FS metadata
 (and FS metadata cache can be limited) and cassandra in-memory data.
 Anyway, as for me, performance test would be great in this area. Otherwise
 it's all speculations.

Agreed... I guess my thought is the default is 5MB and the
recommendations of the developers is to not stray too far from that.
So unless you've done the performance benchmarks to prove otherwise,
I'm not sure why you chose a value about 100x that?

Also, I notice you're talking about 700GB/node?  That's about 200%
above the recommended maximum of 300-400GB node.  I notice a lot of
people are trying to push this number, because while disk is
relatively cheap, computers are not.


-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: any ways to have compaction use less disk space?

2012-09-25 Thread Rob Coli
On Sun, Sep 23, 2012 at 12:24 PM, Aaron Turner synfina...@gmail.com wrote:
 Leveled compaction've tamed space for us. Note that you should set
 sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
 per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.

700gb = 716800mb  / 5mb = 143360

150,000 sstables seem highly unlikely to be performant. As a simple
example of why, on the read path the bloom filter for every sstable
must be consulted...

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: any ways to have compaction use less disk space?

2012-09-24 Thread Віталій Тимчишин
Why so?
What are pluses and minuses?
As for me, I am looking for number of files in directory.
700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
700GB/5MB*5 = 70 files, that is too much for single directory, too much
memory used for SST data, too huge compaction queue (that leads to strange
pauses, I suppose because of compactor thinking what to compact next),...

2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
 you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
 are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-24 Thread Aaron Turner
On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote:
 Why so?
 What are pluses and minuses?
 As for me, I am looking for number of files in directory.
 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
 700GB/5MB*5 = 70 files, that is too much for single directory, too much
 memory used for SST data, too huge compaction queue (that leads to strange
 pauses, I suppose because of compactor thinking what to compact next),...


Not sure why a lot of files is a problem... modern filesystems deal
with that pretty well.

Really large sstables mean that compactions now are taking a lot more
disk IO and time to complete.   Remember, Leveled Compaction is more
disk IO intensive, so using large sstables makes that even worse.
This is a big reason why the default is 5MB. Also, each level is 10x
the size as the previous level.  Also, for level compaction, you need
10x the sstable size worth of free space to do compactions.  So now
you need 5GB of free disk, vs 50MB of free disk.

Also, if you're doing deletes in those CF's, that old, deleted data is
going to stick around a LOT longer with 512MB files, because it can't
get deleted until you have 10x512MB files to compact to level 2.
Heaven forbid it doesn't get deleted then because each level is 10x
bigger so you end up waiting a LOT longer to actually delete that data
from disk.

Now, if you're using SSD's then larger sstables is probably doable,
but even then I'd guesstimate 50MB is far more reasonable then 512MB.

-Aaron


 2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
  you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
  are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB
  SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with
  ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


 --
 Best regards,
  Vitalii Tymchyshyn



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: any ways to have compaction use less disk space?

2012-09-24 Thread Edward Capriolo
If you are using ext3 there is a hard limit on number if files in a
directory of 32K.  EXT4 as a much higher limit (cant remember exactly
IIRC). So true that having many files is not a problem for the file
system though your VFS cache could be less efficient since you would
have a higher inode-data ratio.

Edward

On Mon, Sep 24, 2012 at 7:03 PM, Aaron Turner synfina...@gmail.com wrote:
 On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote:
 Why so?
 What are pluses and minuses?
 As for me, I am looking for number of files in directory.
 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
 700GB/5MB*5 = 70 files, that is too much for single directory, too much
 memory used for SST data, too huge compaction queue (that leads to strange
 pauses, I suppose because of compactor thinking what to compact next),...


 Not sure why a lot of files is a problem... modern filesystems deal
 with that pretty well.

 Really large sstables mean that compactions now are taking a lot more
 disk IO and time to complete.   Remember, Leveled Compaction is more
 disk IO intensive, so using large sstables makes that even worse.
 This is a big reason why the default is 5MB. Also, each level is 10x
 the size as the previous level.  Also, for level compaction, you need
 10x the sstable size worth of free space to do compactions.  So now
 you need 5GB of free disk, vs 50MB of free disk.

 Also, if you're doing deletes in those CF's, that old, deleted data is
 going to stick around a LOT longer with 512MB files, because it can't
 get deleted until you have 10x512MB files to compact to level 2.
 Heaven forbid it doesn't get deleted then because each level is 10x
 bigger so you end up waiting a LOT longer to actually delete that data
 from disk.

 Now, if you're using SSD's then larger sstables is probably doable,
 but even then I'd guesstimate 50MB is far more reasonable then 512MB.

 -Aaron


 2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
  you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
  are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB
  SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with
  ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


 --
 Best regards,
  Vitalii Tymchyshyn



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero


Re: any ways to have compaction use less disk space?

2012-09-23 Thread Віталій Тимчишин
If you think about space, use Leveled compaction! This won't only allow you
to fill more space, but also will shrink you data much faster in case of
updates. Size compaction can give you 3x-4x more space used than there are
live data. Consider the following (our simplified) scenario:
1) The data is updated weekly
2) Each week a large SSTable is written (say, 300GB) after full update
processing.
3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
4) Only after 4th week they all will be compacted into one 300GB SSTable.

Leveled compaction've tamed space for us. Note that you should set
sstable_size_in_mb
to reasonably high value (it is 512 for us with ~700GB per node) to prevent
creating a lot of small files.

Best regards, Vitalii Tymchyshyn.

2012/9/20 Hiller, Dean dean.hil...@nrel.gov

 While diskspace is cheap, nodes are not that cheap, and usually systems
 have a 1T limit on each node which means we would love to really not add
 more nodes until we hit 70% disk space instead of the normal 50% that we
 have read about due to compaction.

 Is there any way to use less disk space during compactions?
 Is there any work being done so that compactions take less space in the
 future meaning we can buy less nodes?

 Thanks,
 Dean




-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-23 Thread Aaron Turner
On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com wrote:
 If you think about space, use Leveled compaction! This won't only allow you
 to fill more space, but also will shrink you data much faster in case of
 updates. Size compaction can give you 3x-4x more space used than there are
 live data. Consider the following (our simplified) scenario:
 1) The data is updated weekly
 2) Each week a large SSTable is written (say, 300GB) after full update
 processing.
 3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
 4) Only after 4th week they all will be compacted into one 300GB SSTable.

 Leveled compaction've tamed space for us. Note that you should set
 sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
 per node) to prevent creating a lot of small files.

512MB per sstable?  Wow, that's freaking huge.  From my conversations
with various developers 5-10MB seems far more reasonable.   I guess it
really depends on your usage patterns, but that seems excessive to me-
especially as sstables are promoted.



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


any ways to have compaction use less disk space?

2012-09-20 Thread Hiller, Dean
While diskspace is cheap, nodes are not that cheap, and usually systems have a 
1T limit on each node which means we would love to really not add more nodes 
until we hit 70% disk space instead of the normal 50% that we have read about 
due to compaction.

Is there any way to use less disk space during compactions?
Is there any work being done so that compactions take less space in the future 
meaning we can buy less nodes?

Thanks,
Dean


Re: any ways to have compaction use less disk space?

2012-09-20 Thread Aaron Turner
1. Use compression

2. Used Leveled Compaction

Also, 1TB/node is a lot larger then the normal recommendation...
generally speaking more in the 300-400GB range.

On Thu, Sep 20, 2012 at 8:10 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 While diskspace is cheap, nodes are not that cheap, and usually systems have 
 a 1T limit on each node which means we would love to really not add more 
 nodes until we hit 70% disk space instead of the normal 50% that we have read 
 about due to compaction.

 Is there any way to use less disk space during compactions?
 Is there any work being done so that compactions take less space in the 
 future meaning we can buy less nodes?

 Thanks,
 Dean



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero