Offline compaction/merging of multiple SSTables into one

2019-06-03 Thread Alexander Shukaev (BLOOMBERG/ FRANKFURT)
Hi Everyone,

I have the following question [1]:

```
$ cd /tmp
$ cp -r /var/lib/cassandra/data/keyspace/table-6e9e81a0808811e9ace14f79cedcfbc4 
.
$ nodetool compact --user-defined 
table-6e9e81a0808811e9ace14f79cedcfbc4/*-Data.db
```

I expected the two SSTables (where the second one contains only tombstones) to 
be merged into one, which would be equivalent to the first one minus data 
masked by tombstones from the second one.

However, the last command returns `0` exit status and nothing changes in the 
`table-6e9e81a0808811e9ace14f79cedcfbc4` directory (still two tables are 
there).  Any ideas how to unconditionally merge potentially multiple SSTables 
into one in the offline manner (like above, not on SSTable files currently used 
by the running cluster)?

References
--

[1] https://stackoverflow.com/q/56427498/1743860

Regards,
Alexander

Re: Collecting Latency Metrics

2019-06-03 Thread shalom sagges
Thanks a lot for your comments.
This mailing list is truly *the *definitive guide to Cassandra
*. *
The knowledge transferred here is invaluable.
So just wanted to give a big shout out to anyone who is helping out here.

Regards,

On Thu, May 30, 2019 at 6:10 PM Jon Haddad  wrote:

> Yep.  I would *never* use mean when it comes to performance to make any
> sort of decisions.  I prefer to graph all the p99 latencies as well as the
> max.
>
> Some good reading on the topic:
> https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
>
> On Thu, May 30, 2019 at 7:35 AM Chris Lohfink 
> wrote:
>
>> For what it is worth, generally I would recommend just using the mean vs
>> calculating it yourself. It's a lot easier and averages are meaningless for
>> anything besides trending anyway (which is really what this is useful for,
>> finding issues on the larger scale), especially with high volume clusters
>> so the loss in accuracy kinda moot. Your average for local reads/writes
>> will almost always be sub millisecond but you might end up having 500
>> millisecond requests or worse that the mean will hide.
>>
>> Chris
>>
>> On Thu, May 30, 2019 at 6:30 AM shalom sagges 
>> wrote:
>>
>>> Thanks for your replies guys. I really appreciate it.
>>>
>>> @Alain, I use Graphite for backend on top of Grafana. But the goal is to
>>> move from Graphite to Prometheus eventually.
>>>
>>> I tried to find a direct way of getting a specific Latency metric in
>>> average and as Chris pointed out, then Mean value isn't that accurate.
>>> I do not wish to use the percentile metrics either, but a single latency
>>> metric like the *"Local read latency" *output in nodetool tablestats.
>>> Looking at the code of nodetool tablestats, it seems that C* also
>>> divides *ReadTotalLatency.Count* with *ReadLatency.Count *to get the
>>> latency result.
>>>
>>> So I guess I will have no choice but to run the calculation on my own
>>> via Graphite:
>>>
>>> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count
>>>
>>> Does this seem right to you?
>>>
>>> Thanks!
>>>
>>> On Thu, May 30, 2019 at 12:34 AM Paul Chandler 
>>> wrote:
>>>
 There are various attributes under
 org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
 latency in milliseconds

 Thanks

 Paul
 www.redshots.com

 > On 29 May 2019, at 15:31, shalom sagges 
 wrote:
 >
 > Hi All,
 >
 > I'm creating a dashboard that should collect read/write latency
 metrics on C* 3.x.
 > In older versions (e.g. 2.0) I used to divide the total read latency
 in microseconds with the read count.
 >
 > Is there a metric attribute that shows read/write latency without the
 need to do the math, such as in nodetool tablestats "Local read latency"
 output?
 > I saw there's a Mean attribute in
 org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
 one.
 >
 > I'd really appreciate your help on this one.
 > Thanks!
 >
 >


 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org