Re: Many SSTables only on one node

2018-04-09 Thread kurt greaves
If there were no other messages about anti-compaction similar to:
>
> SSTable YYY (ranges) will be anticompacted on range [range]


Then no anti-compaction needed to occur and yes, it was not the cause.

On 5 April 2018 at 13:52, Dmitry Simonov  wrote:

> Hi, Evelyn!
>
> I've found the following messages:
>
> INFO RepairRunnable.java Starting repair command #41, repairing keyspace
> XXX with repair options (parallelism: parallel, primary range: false,
> incremental: false, job threads: 1, ColumnFamilies: [YYY], dataCenters: [],
> hosts: [], # of ranges: 768)
> INFO CompactionExecutor:6 CompactionManager.java Starting anticompaction
> for XXX.YYY on 5132/5846 sstables
>
> After that many similar messages go:
> SSTable BigTableReader(path='/mnt/cassandra/data/XXX/YYY-
> 4c12fd9029e611e8810ac73ddacb37d1/lb-12688-big-Data.db') fully contained
> in range (-9223372036854775808,-9223372036854775808], mutating repairedAt
> instead of anticompacting
>
> Does it means that anti-compaction is not the cause?
>
> 2018-04-05 18:01 GMT+05:00 Evelyn Smith :
>
>> It might not be what cause it here. But check your logs for
>> anti-compactions.
>>
>>
>> On 5 Apr 2018, at 8:35 pm, Dmitry Simonov  wrote:
>>
>> Thank you!
>> I'll check this out.
>>
>> 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski :
>>
>>> 40 pending compactions is pretty high and you should have way less than
>>> that most of the time, otherwise it means that compaction is not keeping up
>>> with your write rate.
>>>
>>> If you indeed have SSDs for data storage, increase your compaction
>>> throughput to 100 or 200 (depending on how the CPUs handle the load). You
>>> can experiment with compaction throughput using : nodetool
>>> setcompactionthroughput 100
>>>
>>> You can raise the number of concurrent compactors as well and set it to
>>> a value between 4 and 6 if you have at least 8 cores and CPUs aren't
>>> overwhelmed.
>>>
>>> I'm not sure why you ended up with only one node having 6k SSTables and
>>> not the others, but you should apply the above changes so that you can
>>> lower the number of pending compactions and see if it prevents the issue
>>> from happening again.
>>>
>>> Cheers,
>>>
>>>
>>> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov 
>>> wrote:
>>>
 Hi, Alexander!

 SizeTieredCompactionStrategy is used for all CFs in problematic
 keyspace.
 Current compaction throughput is 16 MB/s (default value).

 We always have about 40 pending and 2 active "CompactionExecutor" tasks
 in "tpstats".
 Mostly because of another (bigger) keyspace in this cluster.
 But the situation is the same on each node.

 According to "nodetool compactionhistory", compactions on this CF run
 (sometimes several times per day, sometimes one time per day, the last run
 was yesterday).
 We run "repair -full" regulary for this keyspace (every 24 hours on
 each node), because gc_grace_seconds is set to 24 hours.

 Should we consider increasing compaction throughput and
 "concurrent_compactors" (as recommended for SSDs) to keep
 "CompactionExecutor" pending tasks low?

 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski :

> Hi Dmitry,
>
> could you tell us which compaction strategy that table is currently
> using ?
> Also, what is the compaction max throughput and is auto-compaction
> correctly enabled on that node ?
>
> Did you recently run repair ?
>
> Thanks,
>
> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
> wrote:
>
>> Hello!
>>
>> Could you please give some ideas on the following problem?
>>
>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>
>> We've recently discovered high CPU usage on one cluster node, after
>> some investigation we found that number of sstables for one CF on it is
>> very big: 5800 sstables, on other nodes: 3 sstable.
>>
>> Data size in this keyspace was not very big ~100-200Mb per node.
>>
>> There is no such problem with other CFs of that keyspace.
>>
>> nodetool compact solved the issue as a quick-fix.
>>
>> But I'm wondering, what was the cause? How prevent it from repeating?
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



 --
 Best Regards,
 Dmitry Simonov

>>> --
>>> -
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
>>
>>
>
>
> --
> Best Regards,
> Dmitry 

Re: Can "data_file_directories" make use of multiple disks?

2018-04-09 Thread Venkata Hari Krishna Nukala
Paulo, thanks for the confirmation. I had raised a ticket for this.

https://issues.apache.org/jira/browse/CASSANDRA-14372



On Tue, Apr 10, 2018 at 2:37 AM, Paulo Motta 
wrote:

> > cassandra.yaml states that "Directories where Cassandra should store
> data on disk. Cassandra will spread data evenly across them, subject to the
> granularity of the configured compaction strategy.". I feel it is not
> correct anymore.  Is it worth updating the doc?
>
> In fact this changed after CASSANDRA-6696, but the comment on
> cassandra.yaml (where the docs is created from) was never updated.
> Would you mind opening a ticket to fix this comment ? Thanks.
>
> 2018-04-09 17:00 GMT-03:00 Venkata Hari Krishna Nukala
> :
> > I spent some time in code (trunk) to understand it better. If I
> understood
> > it correctly DiskBoundaryManager.getDiskBoundaries() method does the
> > partition and it has nothing to do with the compaction strategy. Is it
> > correct?
> >
> > cassandra.yaml states that "Directories where Cassandra should store
> data on
> > disk. Cassandra will spread data evenly across them, subject to the
> > granularity of the configured compaction strategy.". I feel it is not
> > correct anymore.  Is it worth updating the doc?
> >
> >
> >
> > On Tue, Mar 27, 2018 at 9:59 PM, Jonathan Haddad 
> wrote:
> >>
> >> In Cassandra 3.2 and later, data is partitioned by token range, which
> >> should give you even distribution of data.
> >>
> >> If you're going to go into 3.x, please use the latest 3.11, which at
> this
> >> time is 3.11.2.
> >>
> >>
> >> On Tue, Mar 27, 2018 at 8:05 AM Venkata Hari Krishna Nukala
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am trying to replace machines having HDD with little powerful
> machines
> >>> having SSD in production. The data present in each node is around
> 300gb. But
> >>> the newer machines have 2 X 200GB SSDs instead of a single disk.
> >>>
> >>> "data_file_directories" looks like a multi-valued config which I can
> use.
> >>> Am I looking at the right config?
> >>>
> >>> How does the data is distributed evenly? Leveled Compaction Strategy is
> >>> used for the tables.
> >>>
> >>> Thanks!
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Can "data_file_directories" make use of multiple disks?

2018-04-09 Thread Paulo Motta
> cassandra.yaml states that "Directories where Cassandra should store data on 
> disk. Cassandra will spread data evenly across them, subject to the 
> granularity of the configured compaction strategy.". I feel it is not correct 
> anymore.  Is it worth updating the doc?

In fact this changed after CASSANDRA-6696, but the comment on
cassandra.yaml (where the docs is created from) was never updated.
Would you mind opening a ticket to fix this comment ? Thanks.

2018-04-09 17:00 GMT-03:00 Venkata Hari Krishna Nukala
:
> I spent some time in code (trunk) to understand it better. If I understood
> it correctly DiskBoundaryManager.getDiskBoundaries() method does the
> partition and it has nothing to do with the compaction strategy. Is it
> correct?
>
> cassandra.yaml states that "Directories where Cassandra should store data on
> disk. Cassandra will spread data evenly across them, subject to the
> granularity of the configured compaction strategy.". I feel it is not
> correct anymore.  Is it worth updating the doc?
>
>
>
> On Tue, Mar 27, 2018 at 9:59 PM, Jonathan Haddad  wrote:
>>
>> In Cassandra 3.2 and later, data is partitioned by token range, which
>> should give you even distribution of data.
>>
>> If you're going to go into 3.x, please use the latest 3.11, which at this
>> time is 3.11.2.
>>
>>
>> On Tue, Mar 27, 2018 at 8:05 AM Venkata Hari Krishna Nukala
>>  wrote:
>>>
>>> Hi,
>>>
>>> I am trying to replace machines having HDD with little powerful machines
>>> having SSD in production. The data present in each node is around 300gb. But
>>> the newer machines have 2 X 200GB SSDs instead of a single disk.
>>>
>>> "data_file_directories" looks like a multi-valued config which I can use.
>>> Am I looking at the right config?
>>>
>>> How does the data is distributed evenly? Leveled Compaction Strategy is
>>> used for the tables.
>>>
>>> Thanks!
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra Hints file corruption

2018-04-09 Thread Vineet G H
Yes, the commit log show that we built it out f919cf4a4 which we used
later locally to build it.

I realize that using release artifact is suggested. We tried even
3.11.1 ( official release) and where able to reproduce this issue on
14 node cluster

On Mon, Apr 9, 2018 at 12:11 PM, Michael Shuler  wrote:
> On 04/09/2018 01:43 PM, Vineet G H wrote:
>> Hello All,
>>
>> We have a 14 node Cassandra cluster 3.11.1. For some odd reason
>> intermittently we see the following error
>>
>> ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[HintsDispatcher:1,1,main]
>> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest
>> mismatch exception
>> at 
>> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298)
>> ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
>
> 3.11.1-SNAPSHOT? This could be any commit between the 3.11.0 and 3.11.1
> releases (usually). If you built this yourself, what commit sha is your
> SNAPSHOT jar from and does the git log show it includes commit f919cf4a4?
>
> Generally, using a release artifact is highly suggested, since everyone
> knows the code state of the release. No one but yourself can have any
> reasonable knowledge of where your cluster is running at code-wise.
>
>> The jar in question has the patch from bug
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13696
>>
>> We are able get past the issue by running truncatehints
>>
>> 1. Could these new manifestation of the issue? Probably not related to bug 
>> above
>> 2. Are there any tools which dump hints file content?
>> 3. What are the implication of truncatehints? Sounds like there could
>> be data loss, but we have quoram for writes and reads, which means we
>> should enough replicas to reconstruct the data.
>>
>> I am gathering more evidence on the issue would be happy to work with devs
>>
>> Regards,
>> Vineet
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Can "data_file_directories" make use of multiple disks?

2018-04-09 Thread Venkata Hari Krishna Nukala
I spent some time in code (trunk) to understand it better. If I understood
it correctly DiskBoundaryManager.getDiskBoundaries() method does the
partition and it has nothing to do with the compaction strategy. Is it
correct?

cassandra.yaml states that "Directories where Cassandra should store data
on disk. Cassandra will spread data evenly across them, *subject to the
granularity of the configured compaction strategy.*". I feel it is not
correct anymore.  Is it worth updating the doc?



On Tue, Mar 27, 2018 at 9:59 PM, Jonathan Haddad  wrote:

> In Cassandra 3.2 and later, data is partitioned by token range, which
> should give you even distribution of data.
>
> If you're going to go into 3.x, please use the latest 3.11, which at this
> time is 3.11.2.
>
>
> On Tue, Mar 27, 2018 at 8:05 AM Venkata Hari Krishna Nukala <
> n.v.harikrishna.apa...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to replace machines having HDD with little powerful machines
>> having SSD in production. The data present in each node is around 300gb.
>> But the newer machines have 2 X 200GB SSDs instead of a single disk.
>>
>> "data_file_directories" looks like a multi-valued config which I can use.
>> Am I looking at the right config?
>>
>> How does the data is distributed evenly? Leveled Compaction Strategy is
>> used for the tables.
>>
>> Thanks!
>>
>


Re: Cassandra Hints file corruption

2018-04-09 Thread Michael Shuler
On 04/09/2018 01:43 PM, Vineet G H wrote:
> Hello All,
> 
> We have a 14 node Cassandra cluster 3.11.1. For some odd reason
> intermittently we see the following error
> 
> ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423
> CassandraDaemon.java:228 - Exception in thread
> Thread[HintsDispatcher:1,1,main]
> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest
> mismatch exception
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298)
> ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]

3.11.1-SNAPSHOT? This could be any commit between the 3.11.0 and 3.11.1
releases (usually). If you built this yourself, what commit sha is your
SNAPSHOT jar from and does the git log show it includes commit f919cf4a4?

Generally, using a release artifact is highly suggested, since everyone
knows the code state of the release. No one but yourself can have any
reasonable knowledge of where your cluster is running at code-wise.

> The jar in question has the patch from bug
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13696
> 
> We are able get past the issue by running truncatehints
> 
> 1. Could these new manifestation of the issue? Probably not related to bug 
> above
> 2. Are there any tools which dump hints file content?
> 3. What are the implication of truncatehints? Sounds like there could
> be data loss, but we have quoram for writes and reads, which means we
> should enough replicas to reconstruct the data.
> 
> I am gathering more evidence on the issue would be happy to work with devs
> 
> Regards,
> Vineet

-- 
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cassandra Hints file corruption

2018-04-09 Thread Vineet G H
Hello All,

We have a 14 node Cassandra cluster 3.11.1. For some odd reason
intermittently we see the following error

ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423
CassandraDaemon.java:228 - Exception in thread
Thread[HintsDispatcher:1,1,main]
org.apache.cassandra.io.FSReadError: java.io.IOException: Digest
mismatch exception
at 
org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298)
~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]

The jar in question has the patch from bug

https://issues.apache.org/jira/browse/CASSANDRA-13696

We are able get past the issue by running truncatehints

1. Could these new manifestation of the issue? Probably not related to bug above
2. Are there any tools which dump hints file content?
3. What are the implication of truncatehints? Sounds like there could
be data loss, but we have quoram for writes and reads, which means we
should enough replicas to reconstruct the data.

I am gathering more evidence on the issue would be happy to work with devs

Regards,
Vineet

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Can I sort it as a result of group by?

2018-04-09 Thread DuyHai Doan
No, sorting by column other than clustering column is not possible

On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim  wrote:

> Hello, everyone.
>
> I am using 3.11.0 and I have the following table.
>
> CREATE TABLE summary_5m (
> service_key text,
> hash_key int,
> instance_hash int,
> collected_time timestamp,
> count int,
> PRIMARY KEY ((service_key), hash_key, instance_hash, collected_time)
> )
>
>
> And I can sum count grouping by primary key.
>
> select service_key, hash_key, instance_hash, sum(count) as count_summ
> from apm.ip_summary_5m
> where service_key='ABCED'
> group by service_key, hash_key, instance_hash;
>
>
> But what I want is to get only the top 100 with a high value added.
>
> Like following query is attached … (syntax error, of course)
>
> order by count_sum limit 100;
>
> Anybody have ever solved this problem?
>
> Thank you in advance.
>
>
>


Can I sort it as a result of group by?

2018-04-09 Thread Eunsu Kim
Hello, everyone.

I am using 3.11.0 and I have the following table.

CREATE TABLE summary_5m (
service_key text,
hash_key int,
instance_hash int,
collected_time timestamp,
count int,
PRIMARY KEY ((service_key), hash_key, instance_hash, collected_time)
)


And I can sum count grouping by primary key.

select service_key, hash_key, instance_hash, sum(count) as count_summ 
from apm.ip_summary_5m 
where service_key='ABCED'
group by service_key, hash_key, instance_hash;


But what I want is to get only the top 100 with a high value added.

Like following query is attached … (syntax error, of course)

order by count_sum limit 100;

Anybody have ever solved this problem?

Thank you in advance.




Re: write latency on single partition table

2018-04-09 Thread Alain RODRIGUEZ
Hi,

Challenging the possibilty that the latancy is related with the number of
record is a good guess indeed. It might be, but I don't think so, given the
max 50 Mb partition size. This should should allow to catch a partition of
this size, probably below 1 second.

It is possible to trace a query and see how it perform throughout the
distinct internal processes, and find what takes time. There are multiple
way to do so:

- 'TRACING ON'  in cqlsh, then run a problematic query (pay attention to
the consistency level - ONE by default. Use the one in use in the
application facing latencies).
- 'nodetool settraceprobability 0.001' - (here, be careful with
implications of setting this value too high, query are tracked inside
Cassandra, potentially generating a heavy load.


Other interesting global info:

- 'nodetool cfhistograms' (or tablehistograms?) - to have more precise
statistics on percentiles for example
- 'nodetool cfstats' (or tablestats) - detailed informations on how the
table/queries are performing on the node
- 'nodetool tpstats' - Thread pool statistics. Look for pending, dropped or
blocked tasks, generally, it's not good :).


If you suspect tombstones, you can use sstablemetadata to check the
tombstone ratio. It can also be related to poor caching, the number of
sstable hit on disk or inefficient bloom filters for example. There are
other reasons to slow reads.

When it comes to the read path, multiple parts come into play and the
global result is a bit complex to troubleshoot. Yet trying to narrow down
the scope, to eliminate possibilities one by one or directly detect the
issue using tracing to find out the latency comes from mostly.

If you find something weird but unclear to you, post here again and we will
hopefully able to help with extra information on the part that is slow :).

C*heers!
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2018-04-07 6:05 GMT+01:00 onmstester onmstester :

> The size is less than 50MB
>
> Sent using Zoho Mail 
>
>
>  On Sat, 07 Apr 2018 09:09:41 +0430 *Laxmikant Upadhyay
> >* wrote 
>
> It seems your partition size is more..what is the size of value field ?
> Try to keep your partition size within 100 mb.
>
> On Sat, Apr 7, 2018, 9:45 AM onmstester onmstester 
> wrote:
>
>
>
> I've defained a table like this
>
> create table test (
> hours int,
> key1 int,
> value1 varchar,
> primary key (hours,key1)
> )
>
> For one hour every input would be written in single partition, because i
> need to group by some 500K records in the partition for a report with
> expected response time in less than 1 seconds so using key1 in partition
> key would made 500K partitions which would be slow on reads.
> Although using  this mechanism gains < 1 seconds response time on reads
> but the write delay increased surprisingly, for this table write latency
> reported by cfstats is more than 100ms but for other tables which accessing
> thousands of partitions while writing in 1 hour , the write delay is
> 0.02ms. But i was expecting that writes to test table be faster than other
> tables because always only one node and one partition would be accessed, so
> no memtable switch happens and all writes would be local to a single node?!
> Should i add another key to my partition key to distribute data on all of
> nodes?
>
> Sent using Zoho Mail 
>
>
>