RE: Compaction Strategy guidance

2014-11-22 Thread Servando Muñoz G .
ABUSE

 

YA NO QUIERO MAS MAILS SOY DE MEXICO

 

De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] 
Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
Para: user@cassandra.apache.org
Asunto: Re: Compaction Strategy guidance
Importancia: Alta

 

Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may need to 
be careful about that) and on CPU. Also LCS (by default) may fall back to STCS 
if it is falling behind (which is very possible with heavy writing activity) 
and this will result in higher disk space usage. Also LCS has certain 
limitation I have discovered lately. Sometimes LCS may not be able to use all 
your node's resources (algorithm limitations) and this reduces the overall 
compaction throughput. This may happen if you have a large column family with 
lots of data per node. STCS won't have this limitation.

 

By the way, the primary goal of LCS is to reduce the number of sstables C* has 
to look at to find your data. With LCS properly functioning this number will be 
most likely between something like 1 and 3 for most of the reads. But if you do 
few reads and not concerned about the latency today, most likely LCS may only 
save you some disk space.

 

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay  wrote:

Hi there,

 

use case:

 

- Heavy write app, few reads.

- Lots of updates of rows / columns.

- Current performance is fine, for both writes and reads..

- Currently using SizedCompactionStrategy

 

We're trying to limit the amount of storage used during compaction. Should we 
switch to LeveledCompactionStrategy? 

 

Thanks




-- 

Nikolai Grigoriev
(514) 772-5178



Re: Compaction Strategy guidance

2014-11-22 Thread Nikolai Grigoriev
Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may
need to be careful about that) and on CPU. Also LCS (by default) may fall
back to STCS if it is falling behind (which is very possible with heavy
writing activity) and this will result in higher disk space usage. Also LCS
has certain limitation I have discovered lately. Sometimes LCS may not be
able to use all your node's resources (algorithm limitations) and this
reduces the overall compaction throughput. This may happen if you have a
large column family with lots of data per node. STCS won't have this
limitation.

By the way, the primary goal of LCS is to reduce the number of sstables C*
has to look at to find your data. With LCS properly functioning this number
will be most likely between something like 1 and 3 for most of the reads.
But if you do few reads and not concerned about the latency today, most
likely LCS may only save you some disk space.

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
wrote:

> Hi there,
>
> use case:
>
> - Heavy write app, few reads.
> - Lots of updates of rows / columns.
> - Current performance is fine, for both writes and reads..
> - Currently using SizedCompactionStrategy
>
> We're trying to limit the amount of storage used during compaction. Should
> we switch to LeveledCompactionStrategy?
>
> Thanks
>



-- 
Nikolai Grigoriev
(514) 772-5178


Compaction Strategy guidance

2014-11-22 Thread Stephane Legay
Hi there,

use case:

- Heavy write app, few reads.
- Lots of updates of rows / columns.
- Current performance is fine, for both writes and reads..
- Currently using SizedCompactionStrategy

We're trying to limit the amount of storage used during compaction. Should
we switch to LeveledCompactionStrategy?

Thanks


Re: Problem with performance, memory consumption, and RLIMIT_MEMLOCK

2014-11-22 Thread Jens Rantil
Hi Dmitri,


I have not used the CPP driver, but maybe you have forgotten set the equivalent 
of the Iava driver's fetchsize to something sensible?




Just an idea,

Jens


—
Sent from Mailbox

On Sun, Nov 16, 2014 at 6:09 PM, Dmitri Dmitrienko 
wrote:

> Hi,
> I have a very simple table in cassandra that contains only three columns:
> id, time and blob with data. I added 1M rows of data and now the database
> is about 12GB on disk.
> 1M is only part of data I want to store in the database, it's necessary to
> synchronize this table with external source. In order to do this, I have to
> read id and time columns of all the rows and compare them with what I see
> in the external source and insert/update/delete the rows where I see a
> difference.
> So, I'm trying to fetch id and time columns from cassandra. All of sudden
> in all 100% my attempts, server hangs for ~ 1minute, while doing so it
> loads >100% CPU, then abnormally terminates with error saying I have to run
> cassandra as root or increase RLIMIT_MEMLOCK.
> I increased RLIMIT_MEMLOCK to 1GB and seems it still is not sufficient.
> It seems cassandra tries to read and lock whole the table in memory,
> ignoring the fact that I need only two tiny columns (~12MB of data).
> This is how it works when I use the latest cpp-driver.
> With cqlsh it works differently -- it show first page of data almost
> immediately, without any sensible delay.
> Is there a way to have cpp-driver working like cqlsh? I'd like to have data
> sent to the client immediately upon availability without any attempts to
> lock huge chunks of virtual memory.
> My platform is 64bit linux (centos) with all necessary updates installed,
> openjdk. I also tried macosx with oracle jdk. In this case I don't get
> RLIMIT_MEMLOCK, but regular out of memory error in system.log, although I
> provided server with sufficiently large heap, as recommended, 8GB.

Re: bootstrapping node stuck in JOINING state

2014-11-22 Thread Stan Lemon
Hello,
I posted a similar issue the other day. We wound up not nuking the data dir
and simply deleting the system keyspace from the data dir and then
restarted the node.  This actually worked and caused our never-ending join
process to complete and the node is now a part of the cluster.

Stan Lemon


On Fri, Nov 21, 2014 at 1:30 PM, Robert Coli  wrote:

> On Fri, Nov 21, 2014 at 9:44 AM, Chris Hornung 
> wrote:
>
>> On bootstrapping the third node, the data steaming sessions completed
>> without issue, but bootstrapping did not finish. The node is stuck in
>> JOINING state even 19 hours or so after data streaming completed.
>>
>
> Stop the joining node. Wipe the data dir including system keyspace.
> Re-bootstrap.
>
> =Rob
> http://twitter.com/rcolidba
>