Re: sstablesplit - status

2017-05-18 Thread Jan Kesten

Hi again,

and thanks for the input. It's not tombstoned data I think, but over a 
really long time very many rows are inserted over and over again - but 
with some significant pauses between the inserts. I found some examples 
where a specific row (for example pk=xyz, value=123) exists in more than 
one or two tables, with exactly the same content but different timestamps.


The largest sstables compacted a while ago are now 300-400G in size over 
some nodes, and it's very unlikely they will be compacted some time soon 
as there are only one or two sstables of that size on a single node.


I think I will try rebootstraping a node to see if that helps. 
sstablesplit exists in 2.x - but as far as I know is deprecated and in 
my 3.6 test-cluster it was gone.


I was trying sstabledump to have a deeper look - but that says "pre-3.0 
SSTabe is not supported" (fair, I am on a 2.2.8 cluster).


Jan


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: sstablesplit - status

2017-05-17 Thread Shalom Sagges
If you make all as 10gb each, they will compact immediately into same size
again.


The idea is actually to trigger the compaction so the tombstones will be
removed. That's the whole purpose of the split. and if the split sstable
has lots of tombstones, it'll be compacted to a much smaller size.
Also, you can always play with the compaction threshold to suit your needs.


Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Wed, May 17, 2017 at 8:23 PM, Nitan Kainth  wrote:

> Right, but realistically that is what happens with SizeTiered. Another
> option is to split the tables in proportion size NOT same size. Like 100 GB
> into 50, 25, 12,13. If you make all as 10gb each, they will compact
> immediately into same size again. Motive is to get rid of duplicates which
> exist on smaller tables outside this one big table (as per my understanding
> from your email).
>
> IOn May 17, 2017, at 12:20 PM, Hannu Kröger  wrote:
>
> Basically meaning that if you run major compaction (=nodetool compact),
> you will end up with even bigger file and that is likely to never get
> compacted without running major compaction again. And therefore not
> recommended for production system.
>
> Hannu
>
>
> On 17 May 2017, at 19:46, Nitan Kainth  wrote:
>
> You can try running major compaction to get rid of duplicate data and
> deleted data. But will be the routine for future.
>
> On May 17, 2017, at 10:23 AM, Jan Kesten  wrote:
>
> me patt
>
>
>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: sstablesplit - status

2017-05-17 Thread Nitan Kainth
Right, but realistically that is what happens with SizeTiered. Another option 
is to split the tables in proportion size NOT same size. Like 100 GB into 50, 
25, 12,13. If you make all as 10gb each, they will compact immediately into 
same size again. Motive is to get rid of duplicates which exist on smaller 
tables outside this one big table (as per my understanding from your email).

> IOn May 17, 2017, at 12:20 PM, Hannu Kröger  wrote:
> 
> Basically meaning that if you run major compaction (=nodetool compact), you 
> will end up with even bigger file and that is likely to never get compacted 
> without running major compaction again. And therefore not recommended for 
> production system.
> 
> Hannu
>  
>> On 17 May 2017, at 19:46, Nitan Kainth > > wrote:
>> 
>> You can try running major compaction to get rid of duplicate data and 
>> deleted data. But will be the routine for future.
>> 
>>> On May 17, 2017, at 10:23 AM, Jan Kesten >> > wrote:
>>> 
>>> me patt
>> 
> 



Re: sstablesplit - status

2017-05-17 Thread Hannu Kröger
Basically meaning that if you run major compaction (=nodetool compact), you 
will end up with even bigger file and that is likely to never get compacted 
without running major compaction again. And therefore not recommended for 
production system.

Hannu
 
> On 17 May 2017, at 19:46, Nitan Kainth  wrote:
> 
> You can try running major compaction to get rid of duplicate data and deleted 
> data. But will be the routine for future.
> 
>> On May 17, 2017, at 10:23 AM, Jan Kesten > > wrote:
>> 
>> me patt
> 



Re: sstablesplit - status

2017-05-17 Thread Nitan Kainth
You can try running major compaction to get rid of duplicate data and deleted 
data. But will be the routine for future.

> On May 17, 2017, at 10:23 AM, Jan Kesten  wrote:
> 
> me patt



sstablesplit - status

2017-05-17 Thread Jan Kesten

Hi all,

I have some problem with really large sstables which dont get compacted 
anymore and I know there are many duplicated rows in them. Splitting the 
tables into smaller ones to get them compacted again would help I 
thought, so I tried sstablesplit, but:


cassandra@cassandra01 /tmp/cassandra $ 
./apache-cassandra-3.10/tools/bin/sstablesplit lb-388151-big-Data.db

Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split
cassandra@cassandra01 /tmp/cassandra $ sstablesplit lb-388151-big-Data.db
Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split

It seems that sstablesplit cant handle the "new" filename pattern 
anymore (acutally running 2.2.8 on those nodes).


Any hints or other suggestions to split those sstables or get rid of them?

Thanks in advance,
Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org