Re: saving distinct data in cassandra result in many tombstones

2018-06-12 Thread Elliott Sims
If this is data that expires after a certain amount of time, you probably
want to look into using TWCS and TTLs to minimize the number of tombstones.

Decreasing gc_grace_seconds then compacting will reduce the number of
tombstones, but at the cost of potentially resurrecting deleted data if the
table hasn't been repaired during the grace interval.  You can also just
increase the tombstone thresholds, but the queries will be pretty
expensive/wasteful.

On Tue, Jun 12, 2018 at 2:02 AM, onmstester onmstester 
wrote:

> Hi,
>
> I needed to save a distinct value for a key in each hour, the problem with
> saving everything and computing distincts in memory is that there
> are too many repeated data.
> Table schema:
> Table distinct(
> hourNumber int,
> key text,
> distinctValue long
> primary key (hourNumber)
> )
>
> I want to retrieve distinct count of all keys in a specific hour and using
> this data model it would be achieved by reading a single partition.
> The problem : i can't read from this table, system.log indicates that more
> than 100K tombstones read and no live data in it. The gc_grace time is
> the default (10 days), so i thought decreasing it to 1 hour and run
> compaction, but is this a right approach at all? i mean the whole idea of
> replacing
> some millions of rows. each  10 times in a partition again and again that
> creates alot of tombstones just to achieve distinct behavior?
>
> Thanks in advance
>
> Sent using Zoho Mail 
>
>
>


Re: What will happen after adding another data disk

2018-06-12 Thread Jeff Jirsa
JBOD before 3.6 or so mixed data between disks in a way that if one disk 
failed, you needed to treat them all as failed and replace the host

-- 
Jeff Jirsa


> On Jun 12, 2018, at 1:53 AM, Kyrylo Lebediev  wrote:
> 
> Also it worth noting, that usage of JBOD isn't recommended for older 
> Cassandra versions, as there are known issues with data imbalance on JBOD.
> iirc JBOD data imbalance was fixed in some 3.x version (3.2?)
> For older versions creation one large filesystem on top md or lvm device 
> seems to be a better choice.
> 
> From: Eunsu Kim 
> Sent: Tuesday, June 12, 2018 9:06:07 AM
> To: user@cassandra.apache.org
> Subject: Re: What will happen after adding another data disk
>  
> In my experience, adding a new disk and restarting the Cassandra process 
> slowly distributes the disk usage evenly, so that existing disks have less 
> disk usage
> 
>> On 12 Jun 2018, at 11:09 AM, wxn...@zjqunshuo.com wrote:
>> 
>> Hi,
>> I know Cassandra can make use of multiple disks. My data disk is almost full 
>> and I want to add another 2TB disk. I don't know what will happen after the 
>> addition.
>> 1. C* will write to both disks util the old disk is full?
>> 2. And what will happen after the old one is full? Will C* stop writing to 
>> the old one and only writing to the new one with free space?
>> 
>> Thanks!
> 


Re: What will happen after adding another data disk

2018-06-12 Thread wxn...@zjqunshuo.com
Thank you all. 
I don't know if my case is the situation mentioned by JBOD. My cluster is on 
Aliyun Cloud and the Cassandra version is 2.2.8.  Data imbalance is not a 
problem for me if whenever memtable is flushing to sstable, and Cassandra can 
choose a disk with sufficient free space.

Thanks,
-Simon
 
From: Kyrylo Lebediev
Date: 2018-06-12 16:53
To: user@cassandra.apache.org
Subject: Re: What will happen after adding another data disk
Also it worth noting, that usage of JBOD isn't recommended for older Cassandra 
versions, as there are known issues with data imbalance on JBOD.
iirc JBOD data imbalance was fixed in some 3.x version (3.2?)
For older versions creation one large filesystem on top md or lvm device seems 
to be a better choice.



From: Eunsu Kim 
Sent: Tuesday, June 12, 2018 9:06:07 AM
To: user@cassandra.apache.org
Subject: Re: What will happen after adding another data disk 
 
In my experience, adding a new disk and restarting the Cassandra process slowly 
distributes the disk usage evenly, so that existing disks have less disk usage

On 12 Jun 2018, at 11:09 AM, wxn...@zjqunshuo.com wrote:

Hi,
I know Cassandra can make use of multiple disks. My data disk is almost full 
and I want to add another 2TB disk. I don't know what will happen after the 
addition.
1. C* will write to both disks util the old disk is full?
2. And what will happen after the old one is full? Will C* stop writing to the 
old one and only writing to the new one with free space?

Thanks!



saving distinct data in cassandra result in many tombstones

2018-06-12 Thread onmstester onmstester
Hi, 



I needed to save a distinct value for a key in each hour, the problem with 
saving everything and computing distincts in memory is that there

are too many repeated data.

Table schema:

Table distinct(

hourNumber int,

key text,

distinctValue long

primary key (hourNumber)

)



I want to retrieve distinct count of all keys in a specific hour and using this 
data model it would be achieved by reading a single partition.

The problem : i can't read from this table, system.log indicates that more than 
100K tombstones read and no live data in it. The gc_grace time is

the default (10 days), so i thought decreasing it to 1 hour and run compaction, 
but is this a right approach at all? i mean the whole idea of replacing

some millions of rows. each  10 times in a partition again and again that 
creates alot of tombstones just to achieve distinct behavior?



Thanks in advance


Sent using Zoho Mail







Re: What will happen after adding another data disk

2018-06-12 Thread Kyrylo Lebediev
Also it worth noting, that usage of JBOD isn't recommended for older Cassandra 
versions, as there are known issues with data imbalance on JBOD.

iirc JBOD data imbalance was fixed in some 3.x version (3.2?)

For older versions creation one large filesystem on top md or lvm device seems 
to be a better choice.



From: Eunsu Kim 
Sent: Tuesday, June 12, 2018 9:06:07 AM
To: user@cassandra.apache.org
Subject: Re: What will happen after adding another data disk

In my experience, adding a new disk and restarting the Cassandra process slowly 
distributes the disk usage evenly, so that existing disks have less disk usage

On 12 Jun 2018, at 11:09 AM, wxn...@zjqunshuo.com 
wrote:

Hi,
I know Cassandra can make use of multiple disks. My data disk is almost full 
and I want to add another 2TB disk. I don't know what will happen after the 
addition.
1. C* will write to both disks util the old disk is full?
2. And what will happen after the old one is full? Will C* stop writing to the 
old one and only writing to the new one with free space?

Thanks!



Re: What will happen after adding another data disk

2018-06-12 Thread Eunsu Kim
In my experience, adding a new disk and restarting the Cassandra process slowly 
distributes the disk usage evenly, so that existing disks have less disk usage

> On 12 Jun 2018, at 11:09 AM, wxn...@zjqunshuo.com wrote:
> 
> Hi,
> I know Cassandra can make use of multiple disks. My data disk is almost full 
> and I want to add another 2TB disk. I don't know what will happen after the 
> addition.
> 1. C* will write to both disks util the old disk is full?
> 2. And what will happen after the old one is full? Will C* stop writing to 
> the old one and only writing to the new one with free space?
> 
> Thanks!