?????? secondary index creation causes C* oom

2018-01-10 Thread Peng Xiao
Thanks Kurt.




--  --
??: "kurt";;
: 2018??1??11??(??) 11:46
??: "User";

: Re: secondary index creation causes C* oom




1.not sure if secondary index creation is the same as index rebuild

Fairly sure they are the same. 
2.we noticed that the memory table flush looks still working,not the same as 
CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing.

Do you by chance have concurrent_compactors only set to 2? Seems like your 
index builds are blocking other compactions from taking place.
Seems that maybe postflush is backed up because it's blocked on writes 
generated from the rebuild? Maybe, anyway.
3.I'm wondering if the block only blocks the specified table which is creating 
secondary index?

If all your flush writers/post flushers are blocked I assume no other flushes 
will be able to take place, regardless of table. 


Seems like CASSANDRA-12796 is related but not sure why it didn't get fixed in 
2.1.

Re: secondary index creation causes C* oom

2018-01-10 Thread kurt greaves
> 1.not sure if secondary index creation is the same as index rebuild
>
Fairly sure they are the same.

> 2.we noticed that the memory table flush looks still working,not the same
> as CASSANDRA-12796 mentioned,but the compactionExecutor pending is
> increasing.
>
Do you by chance have concurrent_compactors only set to 2? Seems like your
index builds are blocking other compactions from taking place.
Seems that maybe postflush is backed up because it's blocked on writes
generated from the rebuild? Maybe, anyway.

> 3.I'm wondering if the block only blocks the specified table which is
> creating secondary index?
>
If all your flush writers/post flushers are blocked I assume no other
flushes will be able to take place, regardless of table.

Seems like CASSANDRA-12796
 is related but not
sure why it didn't get fixed in 2.1.


Re: sstabledump tries to delete a file

2018-01-10 Thread Chris Lohfink
Yes it should be read only, open a jira please. It does look like if the fp
changed it would rebuild or if your missing. When it builds the table
metadata from the sstable it can just set the properties to match that of
the sstable to prevent this.

Chris

On Wed, Jan 10, 2018 at 4:16 AM, Python_Max  wrote:

> Hello all.
>
> I have an error when trying to dump SSTable (Cassandra 3.11.1):
>
> $ sstabledump mc-56801-big-Data.db
> Exception in thread "main" FSWriteError in /var/lib/cassandra/data/<
> keyspace>//mc-56801-big-Summary.db
> at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(
> FileUtils.java:142)
> at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(
> FileUtils.java:159)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> saveSummary(SSTableReader.java:935)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> saveSummary(SSTableReader.java:920)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> load(SSTableReader.java:788)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> load(SSTableReader.java:731)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> open(SSTableReader.java:516)
> at org.apache.cassandra.io.sstable.format.SSTableReader.
> openNoValidation(SSTableReader.java:396)
> at org.apache.cassandra.tools.SSTableExport.main(
> SSTableExport.java:191)
> Caused by: java.nio.file.AccessDeniedException: /var/lib/cassandra/data/<
> keyspace>//mc-56801-big-Summary.db
> at sun.nio.fs.UnixException.translateToIOException(
> UnixException.java:84)
> at sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:102)
> at sun.nio.fs.UnixException.rethrowAsIOException(
> UnixException.java:107)
> at sun.nio.fs.UnixFileSystemProvider.implDelete(
> UnixFileSystemProvider.java:244)
> at sun.nio.fs.AbstractFileSystemProvider.delete(
> AbstractFileSystemProvider.java:103)
> at java.nio.file.Files.delete(Files.java:1126)
> at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(
> FileUtils.java:136)
> ... 8 more
>
> Seems that sstabledump tries to delete and recreate summary file which I
> think is risky because external modification to files that should be
> modified only by Cassandra itself can lead to unpredictable behavior.
> When I copy all related files and change it's owner to myself and run
> sstabledump in that directory then Summary.db file is recreated but it's
> md5 is exactly the same as original Summary.db's file.
>
> I indeed have changed bloom_filter_fp_chance couple months ago, so I
> believe that's the reason why SSTableReader wants to recreate summary file.
>
> After nodetool scrub an error still happens.
>
> I have not found any issues like this in bug tracker.
> Shouldn't sstabledump be read only?
>
> --
> Best regards,
> Python_Max.
>


Re: [announce] Release of Cassandra Prometheus metrics exporter

2018-01-10 Thread Alain RODRIGUEZ
Hello Romain,

This is an amazing information!

I truly believe that monitoring is still a weak point for many people, in
many companies, even though it is quite easy to share between us. I mean
all the cluster have to be mostly monitored the same way (+ some
specificities).

Because of this, I have been involved in building Datadog dashboards for
the community as well. They were released in December 2017 as the default
dashboards for Cassandra on Datadog. I hope you won't mind about me
sneaking this in, but I think this information is complementary, for people
who would use Datadog instead of prometheus: https://www.
datadoghq.com/blog/tlp-cassandra-dashboards/.

Also, I know the efforts it takes to build efficient charts so thank you
(and the team behind) for sharing and saving us this time. It looks like
the community will finally have some nice dashboards out of the box, at
least on Prometheus and Datadog, and it makes me happy :).

I will definitely have a close look.

*Note*: I heard you might have violated a C* trademark unfortunately using
the eye logo. I am not sure, those matters are a bit tricky for me, but you
might want to have a look.

C*heers!
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2018-01-10 15:06 GMT+00:00 Romain Gerard :

> Hello C*,
>
> A little mail to announce that we released today our internal tool at
> Criteo to monitor Cassandra nodes with Prometheus[1].
> https://github.com/criteo/cassandra_exporter
>
> The application is production ready as we use it internally to monitor
> our > 100 Cassandra nodes.
>
> I hope it can be useful to you too !
> Feel free to send feedbacks/contributions/questions.
>
> [1] https://prometheus.io/
>
> Regards,
> Romain Gérard
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Jon Haddad
For what it’s worth, we (TLP) just posted some results comparing pre and post 
meltdown statistics: 
http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html 


> On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas 
>  wrote:
> 
> m4.xlarge do have PCID to my knowledge, but possibly we need a rather new 
> kernel 4.14. But I fail to see how this could help anyway, cause this looks 
> highly Amazon Hypervisor patch related and we do not have the production 
> instances patched at OS/VM level (yet). <>
>  
> Thomas
>  
> From: Dor Laor [mailto:d...@scylladb.com] 
> Sent: Dienstag, 09. Jänner 2018 19:30
> To: user@cassandra.apache.org
> Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>  
> Make sure you pick instances with PCID cpu capability, their TLB overhead 
> flush
> overhead is much smaller
>  
> On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
> > 
> wrote:
> Quick follow up. <>
>  
> Others in AWS reporting/seeing something similar, 
> e.g.:https://twitter.com/BenBromhead/status/950245250504601600 
> 
>  
> So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, 
> we now also have applied a kernel update at OS/VM level on a single node 
> (loadtest and not production though), thus more or less double patched now. 
> Additional CPU impact by OS/VM level kernel patching is more or less 
> negligible, so looks highly Hypervisor related.
>  
> Regards,
> Thomas
>  
> From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com 
> ] 
> Sent: Freitag, 05. Jänner 2018 12:09
> To: user@cassandra.apache.org 
> Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>  
> Hello,
>  
> has anybody already some experience/results if a patched Linux kernel 
> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>  
> In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
> likely correlating with Amazon finished patching the underlying Hypervisor 
> infrastructure …
>  
> Anybody else seeing a similar CPU increase?
>  
> Thanks,
> Thomas
>  
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313 
> 
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313 
> 
>  
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313



Error while performing repair: "Did not get positive replies from all endpoints"

2018-01-10 Thread Akshit Jain
​I have a 10 node C* cluster with 4-5 keyspaces​.
I tried to perform nodetool repair one by one for each keyspace.
For some keyspaces the repair passed but for some it gave this error:
​
I am not able to figure out what is causing this issue.The replica nodes
are up and I am able to ping them from this node.​
​Any suggestions?​









*[2018-01-10 12:50:08,162] Starting repair command #4539, repairing
keyspace ​ with repair options (parallelism: parallel,
primary range: false, incremental: true, job threads: 1, ColumnFamilies:
[], dataCenters: [], hosts: [], # of ranges: 2243)[2018-01-10 12:50:14,047]
Did not get positive replies from all endpoints. List of failed
endpoint(s): [​a.b.c.d, ​e.f.g.h][2018-01-10 12:50:14,047] Repair command
#4539 finished with errorerror: Repair job has failed with the error
message: [2018-01-10 12:50:14,047] Did not get positive replies from all
endpoints. List of failed endpoint(s): *










*[​a.b.c.d, ​e.f.g.h]-- StackTrace --java.lang.RuntimeException: Repair job
has failed with the error message: [2018-01-10 12:50:14,047] Did not get
positive replies from all endpoints. List of failed
endpoint(s): [​a.b.c.d, ​e.f.g.h]at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*


Re: Question upon gracefully restarting c* node(s)

2018-01-10 Thread Jeff Jirsa
Shutdown (drain, rather) does all of those things, but it’s not very patient - 
it doesn’t sleep (and there’s no setup time like reconnecting for every 
invocation of nodetool) so things shutdown quickly in rapid succession, which 
may have client-visible impact.



-- 
Jeff Jirsa


> On Jan 10, 2018, at 6:20 AM, Thakrar, Jayesh  
> wrote:
> 
> Just curious - aside from the "sleep", is this all not part of the shutdown 
> command?
> Is this an "opportunity" to improve C*?
> Having worked with RDBMSes, Hadoop and HBase, stopping communication, 
> flushing memcache (HBase), and relinquishing ownership of data (HBase) is all 
> part of the shutdown process.
>  
>  
> From: Alain RODRIGUEZ 
> Date: Wednesday, January 10, 2018 at 6:19 AM
> To: "user cassandra.apache.org" 
> Subject: Re: Question upon gracefully restarting c* node(s)
>  
> I agree with comments above. Cassandra is robust, and we are just talking 
> about optimising the process. Nothing mandatory. Going to an extreme I would 
> say you can pull and plug back the node power cable and call it a restart, It 
> should not harm if your cluster is properly tuned. Yet optimisation are 
> welcomed as they improve entropy, starting time. Plus we are civilized 
> operators, not barbarians, aren't we ;-)? It's just more 'clean' and 
> efficient. 
> Also, historically, it was mandatory to drain when using counter to prevent 
> over-count as counter are not idempotent. Not sure about this nowadays).
>  
> Last time I asked this very question I ended up building this command that I 
> have been using since then:
>  
> `date && nodetool disablebinary && nodetool disablegossip && sleep 10 && 
> nodetool flush && nodetool drain && sleep 10 && sudo service cassandra 
> restart`
>  
> It does the following:
>  
> - Print the date for the record
> - Stop all clients transports. I never heard about a benefice of shutting 
> down the gossip protocol, and so never did so, it might be better but I can't 
> really say. This way we stop listening for clients.
> - After a small while no clients are using the node, calling the drain 
> flushes memtables and recycle commitlog as Kurt detailed above. Here I add a 
> 'flush' because I haven't been that lucky in the past with drain, sometimes 
> not working at all, sometimes not cleaning commitlogs. I believe flushing 
> first makes this restart command more robust.
> - Finally restart the service.
>  
> I think there is not only one good way to do this. Also, doing it wrong is 
> often not such a big deal.
>  
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>  
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>  
>  
>  
>  
>  
> 2018-01-08 3:33 GMT+00:00 Jeff Jirsa :
> The sequence does have some objective benefits - especially stopping 
> transports and then gossip, it tells everything you’re going offline before 
> you do, so requests won’t get dropped or have to speculate to other replicas. 
>  
>  
> 
> -- 
> Jeff Jirsa
>  
> 
> On Jan 7, 2018, at 7:22 PM, kurt greaves  wrote:
> 
> None are essential. Cassandra will gracefully shutdown in any scenario as 
> long as it's not killed with a SIGKILL. However, drain does have a few 
> benefits over just a normal shutdown. It will stop a few extra services 
> (batchlog, compactions) and importantly it will also force recycling of dirty 
> commitlog segments, meaning there will be less commitlog files to replay on 
> startup and reducing startup time.
>  
> A comment in the code for drain also indicates that it will wait for 
> in-progress streaming to complete, but I haven't managed to find 1. where 
> this occurs, or 2. if it actually differs to a normal shutdown. Note that 
> this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do 
> the exact same thing, however drain will log some extra messages.
>  
> On 2 January 2018 at 07:07, Jing Meng  wrote:
> Hi all.
>  
> Recently we made a change to our production env c* cluster (2.1.18) - placing 
> the commit log to the same SSD where data is stored, which needs restarting 
> all nodes. 
>  
> Before restarting a cassandra node, we ran the following nodetool utils:
> $ nodetool disablethrift && sleep 5
> $ nodetool disablebinary && sleep 5
> $ nodetool disable gossip && sleep 5
> $ nodetool drain && sleep 5
>  
> It was "graceful" as expected (no significant errors found), but the process 
> is still a myth to us: are those commands used above "sufficient", and/or 
> why? The offical doc (docs.datastax.com) did not help with this operation 
> detail, though "nodetool drain" is apparently essential.
>  
>  


Re: Calling StorageService.loadNewSSTables function results in deadlock with compaction background task.

2018-01-10 Thread Jeff Jirsa
Can you open a JIRA with this info? 

-- 
Jeff Jirsa


> On Jan 10, 2018, at 2:34 AM, Desimpel, Ignace  
> wrote:
> 
> Tested on version 2.2.11 (but seems like trunck 3.x is still the same for the 
> related code path), using nodetool refresh for restoring a snapshot
> I guess the Cassandra committers can do something with this .
> 
> Calling StorageService.loadNewSSTables function results in deadlock with 
> compaction background task, 
> because  : 
> 
> From StorageService class , function public void loadNewSSTables(String 
> ksName, String cfName)
> a call is made to ColumnFamilyStore class , function public static 
> synchronized void loadNewSSTables(String ksName, String cfName)
> and then a call to Keyspace class, function public static Keyspace 
> open(String keyspaceName)
> getting to the function private static Keyspace open(String keyspaceName, 
> Schema schema, boolean loadSSTables)
> finally trying to get a lock by synchronized (Keyspace.class)
> 
> So inside the ColumnFamilyStore class lock, there is an attempt to get the 
> lock on the Keyspace.class
> 
> Now at the same time I have the thread OptionalTasks executing the 
> ColumnFamilyStore.getBackgroundCompactionTaskSubmitter() task.
> The thread task is also calling Keyspace.open function, already progressed as 
> far as getting the lock on Keyspace class.
> But then the call also initializes the column families and thus is calling on 
> class ColumnFamilyStore the public static synchronized ColumnFamilyStore 
> createColumnFamilyStore ...
> 
> So function 1 locks A and then B
> And function 2 locks B and then A
> leading to deadlock
> 
> Regards,
> Ignace
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org


[announce] Release of Cassandra Prometheus metrics exporter

2018-01-10 Thread Romain Gerard
Hello C*,

A little mail to announce that we released today our internal tool at
Criteo to monitor Cassandra nodes with Prometheus[1].
https://github.com/criteo/cassandra_exporter

The application is production ready as we use it internally to monitor
our > 100 Cassandra nodes.

I hope it can be useful to you too !
Feel free to send feedbacks/contributions/questions.

[1] https://prometheus.io/

Regards,
Romain Gérard

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: C* Logs to Kibana

2018-01-10 Thread Thakrar, Jayesh
Wondering what is the purpose - is it to get some insight into the cluster?

Besides the logs themselves, another approach that many and I have taken is to 
pull the JMX metrics from Cassandra and push them to an appropriate 
metrics/timeseries system.

Here's one approach of getting JMX metrics out as JSON over HTTP - 
https://blog.pythian.com/two-easy-ways-poll-apache-cassandra-metrics-using-jmx-http-bridge/

I have taken the approach of using Logstash with JMX input plugin to pull 
metrics and push to OpenTSDB.

Here's a sample of the final dashboard - https://pasteboard.co/H2i5dFl.png from 
my metrics.

From: Nicolas Guyomar 
Date: Wednesday, January 10, 2018 at 4:26 AM
To: 
Subject: Re: C* Logs to Kibana

Hi,

I believe you can use Logstash to parse C* logs, using some grok pattern like 
those : https://gist.github.com/ibspoof/917a888adb08a819eab7163b97e018cb so 
that you gain some nice insight of what your cluster is really doing !

It feel more "native" than to add some jar in C* lib in order to change logging 
behavior, and it will be easier for you to post some log on this ML if you keep 
the original format :)

On 10 January 2018 at 11:07, shalom sagges 
> wrote:
Hi All,
I want to push the Cassandra logs (version 3.x) to Kibana.
Is there a way to configure the Cassandra logs to be in json format?
If modifying the logs to json is not an option, I came across this blog post 
from about a year ago regarding that matter:
https://medium.com/@alain.rastoul/pushing-cassandra-logs-into-elasticsearch-9be3b52af754

Is that a good way of accomplishing that?

Thanks!



Re: Too many tombstones using TTL

2018-01-10 Thread DuyHai Doan
"The question is why Cassandra creates a tombstone for every column instead
of single tombstone per row?"

--> Simply because technically it is possible to set different TTL value on
each column of a CQL row

On Wed, Jan 10, 2018 at 2:59 PM, Python_Max  wrote:

> Hello, C* users and experts.
>
> I have (one more) question about tombstones.
>
> Consider the following example:
> cqlsh> create keyspace test_ttl with replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
> primary key (a, b));
> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
> 'C222', 'C333') using ttl 60;
> bash$ nodetool flush
> bash$ sleep 60
> bash$ nodetool compact test_ttl items
> bash$ sstabledump mc-2-big-Data.db
>
> [
>   {
> "partition" : {
>   "key" : [ "AAA" ],
>   "position" : 0
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 58,
> "clustering" : [ "BBB" ],
> "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" :
> 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
> "cells" : [
>   { "name" : "c1", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   },
>   { "name" : "c2", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   },
>   { "name" : "c3", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>   }
> ]
>   }
> ]
>   }
> ]
>
> The question is why Cassandra creates a tombstone for every column instead
> of single tombstone per row?
>
> In production environment I have a table with ~30 columns and It gives me
> a warning for 30k tombstones and 300 live rows. It is 30 times more then it
> could be.
> Can this behavior be tuned in some way?
>
> Thanks.
>
> --
> Best regards,
> Python_Max.
>


Re: Deleted data comes back on node decommission

2018-01-10 Thread Python_Max
Thank you all for your help.

I was able to get rid of zombies (at least end users not reporting that
anymore) using nodetool cleanup.
And old SSTables were indeed unable to merge with each other because of
repairedAt > 0, so cassandra stop + sstablerepairedset + cassandra start in
rolling manner did fix the issue.

Thanks.

On Fri, Dec 15, 2017 at 10:47 PM, kurt greaves  wrote:

> X==5. I was meant to fill that in...
>
> On 16 Dec. 2017 07:46, "kurt greaves"  wrote:
>
>> Yep, if you don't run cleanup on all nodes (except new node) after step
>> x, when you decommissioned node 4 and 5 later on, their tokens will be
>> reclaimed by the previous owner. Suddenly the data in those SSTables is now
>> live again because the token ownership has changed and any data in those
>> SStables will be returned.
>>
>> Remember new nodes only add tokens to the ring, they don't affect other
>> nodes tokens, so if you remove those tokens everything goes back to how it
>> was before those nodes were added.
>>
>> Adding a maker would be incredibly complicated. Plugs not really fit the
>> design of Cassandra. Here it's probably much easier to just follow
>> recommended procedure when adding and removing nodes.
>>
>> On 16 Dec. 2017 01:37, "Python_Max"  wrote:
>>
>> Hello, Jeff.
>>
>>
>> Using your hint I was able to reproduce my situation on 5 VMs.
>> Simplified steps are:
>> 1) set up 3-node cluster
>> 2) create keyspace with RF=3 and table with gc_grace_seconds=60,
>> compaction_interval=10 and unchecked_tombstone_compaction=true (to force
>> compaction later)
>> 3) insert 10..20 records with different partition and clustering keys
>> (consistency 'all')
>> 4) 'nodetool flush' on all 3 nodes
>> 5) add 4th node, add 5th node
>> 6) using 'nodetool getendpoints' find key that moved to both 4th and 5th
>> node
>> 7) delete that record from table (consistency 'all')
>> 8) 'nodetool flush' on all 5 nodes, wait gc_grace_seconds, 'nodetool
>> compact' on nodes which responsible for that key, check that key and
>> tombstone gone using sstabledump
>> 9) decommission 5th node, decommission 4th node
>> 10) select data from table where key=key (consistency quorum)
>>
>> And the row is here.
>>
>> It sounds like bug in cassandra but since it is documented here
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operati
>> ons/opsAddNodeToCluster.html I suppose this counts as feature. It would
>> be better when data which stays in sstable after new node added would have
>> some marker and never returned as result to select query.
>>
>> Thank you very much, Jeff, for pointing me in right direction.
>>
>>
>> On 13.12.17 18:43, Jeff Jirsa wrote:
>>
>>> Did you run cleanup before you shrank the cluster?
>>>
>>>
>> --
>>
>> Best Regards,
>> Python_Max.
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>


-- 
Best regards,
Python_Max.


Re: Question upon gracefully restarting c* node(s)

2018-01-10 Thread Thakrar, Jayesh
Just curious - aside from the "sleep", is this all not part of the shutdown 
command?
Is this an "opportunity" to improve C*?
Having worked with RDBMSes, Hadoop and HBase, stopping communication, flushing 
memcache (HBase), and relinquishing ownership of data (HBase) is all part of 
the shutdown process.


From: Alain RODRIGUEZ 
Date: Wednesday, January 10, 2018 at 6:19 AM
To: "user cassandra.apache.org" 
Subject: Re: Question upon gracefully restarting c* node(s)

I agree with comments above. Cassandra is robust, and we are just talking about 
optimising the process. Nothing mandatory. Going to an extreme I would say you 
can pull and plug back the node power cable and call it a restart, It should 
not harm if your cluster is properly tuned. Yet optimisation are welcomed as 
they improve entropy, starting time. Plus we are civilized operators, not 
barbarians, aren't we ;-)? It's just more 'clean' and efficient.
Also, historically, it was mandatory to drain when using counter to prevent 
over-count as counter are not idempotent. Not sure about this nowadays).

Last time I asked this very question I ended up building this command that I 
have been using since then:

`date && nodetool disablebinary && nodetool disablegossip && sleep 10 && 
nodetool flush && nodetool drain && sleep 10 && sudo service cassandra restart`

It does the following:

- Print the date for the record
- Stop all clients transports. I never heard about a benefice of shutting down 
the gossip protocol, and so never did so, it might be better but I can't really 
say. This way we stop listening for clients.
- After a small while no clients are using the node, calling the drain flushes 
memtables and recycle commitlog as Kurt detailed above. Here I add a 'flush' 
because I haven't been that lucky in the past with drain, sometimes not working 
at all, sometimes not cleaning commitlogs. I believe flushing first makes this 
restart command more robust.
- Finally restart the service.

I think there is not only one good way to do this. Also, doing it wrong is 
often not such a big deal.

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com





2018-01-08 3:33 GMT+00:00 Jeff Jirsa 
>:
The sequence does have some objective benefits - especially stopping transports 
and then gossip, it tells everything you’re going offline before you do, so 
requests won’t get dropped or have to speculate to other replicas.


--
Jeff Jirsa


On Jan 7, 2018, at 7:22 PM, kurt greaves 
> wrote:
None are essential. Cassandra will gracefully shutdown in any scenario as long 
as it's not killed with a SIGKILL. However, drain does have a few benefits over 
just a normal shutdown. It will stop a few extra services (batchlog, 
compactions) and importantly it will also force recycling of dirty commitlog 
segments, meaning there will be less commitlog files to replay on startup and 
reducing startup time.

A comment in the code for drain also indicates that it will wait for 
in-progress streaming to complete, but I haven't managed to find 1. where this 
occurs, or 2. if it actually differs to a normal shutdown. Note that this is 
all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do the exact 
same thing, however drain will log some extra messages.

On 2 January 2018 at 07:07, Jing Meng 
> wrote:
Hi all.

Recently we made a change to our production env c* cluster (2.1.18) - placing 
the commit log to the same SSD where data is stored, which needs restarting all 
nodes.

Before restarting a cassandra node, we ran the following nodetool utils:
$ nodetool disablethrift && sleep 5
$ nodetool disablebinary && sleep 5
$ nodetool disable gossip && sleep 5
$ nodetool drain && sleep 5

It was "graceful" as expected (no significant errors found), but the process is 
still a myth to us: are those commands used above "sufficient", and/or why? The 
offical doc (docs.datastax.com) did not help with 
this operation detail, though "nodetool drain" is apparently essential.




Too many tombstones using TTL

2018-01-10 Thread Python_Max
Hello, C* users and experts.

I have (one more) question about tombstones.

Consider the following example:
cqlsh> create keyspace test_ttl with replication = {'class':
'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
primary key (a, b));
cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111', 'C222',
'C333') using ttl 60;
bash$ nodetool flush
bash$ sleep 60
bash$ nodetool compact test_ttl items
bash$ sstabledump mc-2-big-Data.db

[
  {
"partition" : {
  "key" : [ "AAA" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 58,
"clustering" : [ "BBB" ],
"liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" :
60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
"cells" : [
  { "name" : "c1", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
  },
  { "name" : "c2", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
  },
  { "name" : "c3", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
  }
]
  }
]
  }
]

The question is why Cassandra creates a tombstone for every column instead
of single tombstone per row?

In production environment I have a table with ~30 columns and It gives me a
warning for 30k tombstones and 300 live rows. It is 30 times more then it
could be.
Can this behavior be tuned in some way?

Thanks.

-- 
Best regards,
Python_Max.


Re: Unsubscribe

2018-01-10 Thread Alain RODRIGUEZ
Hello,

To unsubscribe send an email to user-unsubscr...@cassandra.apache.org,
sending it to Cassandra user list spams people instead :).

C*heers,

Alain

2017-12-19 19:04 GMT+00:00 Gerardo R. Blanco :

>
>


Re: unsubscribe

2018-01-10 Thread Alain RODRIGUEZ
Hello,

To unsubscribe send an email to user-unsubscr...@cassandra.apache.org,
sending it to Cassandra user list spams people instead :).

C*heers,

Alain

2017-12-19 17:27 GMT+00:00 raghavendra vutti :

>
>


Re: [EXTERNAL] Cassandra cluster add new node slowly

2018-01-10 Thread Alain RODRIGUEZ
>
> I suspect  the  compactionthroughput has an influence on the new node
> joining.   The command  nodetool | getcompactionthroughput says  'Current
> compaction throughput: 32 MB/s’.


I would say this guess is true, but maybe not the way you think: the more
disk IO you use for compactions, the slower the stream will be (if joining
machine is CPU / Disk IO bounded). When using vnodes, all the node will be
sending data to this new node. Often joining node are sticking at 100 % CPU
or disk usage (or both). This node is considered as an extra replica until
it is up and is not read from, so it is not a big deal.

Thus it is possible to tune compaction to be faster without risk while the
node is joining, but it might have the opposite effect and slow down the
streaming process as it will use more resources that are already a
bottleneck during bootstrap.
Be aware that when the node join you could have performance issues if
compactions are taking too much resources or if you have too many sstables
(the opposite situation were compaction was running too slow). It's good to
find a balance between the streaming speed and what the node can cope with
so when the node joins the ring it does it in a healthy state (acceptable
state at least).

Often I observed that the streaming speed is substantially slower at the
end of the bootstrap as only a few (and eventually just one) nodes are
sending the data while other nodes are done with the streaming. In this
phase, compactions can catch up, the disk space used and the number of
SSTable is greatly reduced, allowing the node to join in good conditions.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-04 6:00 GMT+00:00 Anthony Grasso :

> The speed at which compactions operate is also physically restricted by
> the speed of the disk. If the disks used on the new node are HDDs, then
> increasing the compaction throughput will be of little help. However, if
> the disks on the new node are SSDs then increasing the compaction
> throughput to at least 64MB/s should help speed up compactions.
>
> Regards,
> Anthony
>
> On 4 January 2018 at 14:13, qf zhou  wrote:
>
>> The cassandra version is 3.0.9.
>>
>> I  have changed the heap size (about  32G). Also, the streaming
>> throughput is set 800MB/sec,  and the streaming_socket_timeout_in_ms is
>> default 8640.
>> I suspect  the  compactionthroughput has an influence on the new node
>> joining.   The command  nodetool | getcompactionthroughput says
>>  'Current compaction throughput: 32 MB/s’.
>>
>>
>>
>>
>> 在 2018年1月4日,上午4:59,Durity, Sean R  写道:
>>
>> You don’t mention the version, but here are some general suggestions
>>
>> -  2 GB heap is very small for a node, especially with 1 TB+ of
>> data. What is the physical RAM on the host? In general, you want ½ of
>> physical RAM for the JVM. (Look in jvm.options or cassandra-env.sh)
>> -  You can change the streaming throughput from the existing
>> nodes, if it looks like the new node can handle it. Look at nodetool
>> setstreamthroughput. Default is 200 (MB/sec).
>> -  You might want to check for a streaming_socket_timeout_in_ms.
>> This has changed over the versions. Some details are at:
>> https://issues.apache.org/jira/browse/CASSANDRA-11839. 24 hours is good
>> recommendation.
>> -  If your new node can’t compact fast enough to keep disk usage
>> down, look at compactionthroughput on that node
>> -  nodetool netstats | grep –v “100%” is a good way to see what
>> is happening/if anything is stuck. Newer versions give a bit more info on
>> progress.
>> -  Don’t forget to run cleanup on existing nodes after the new
>> nodes are added.
>>
>>
>>
>> Sean Durity
>> *From:* qf zhou [mailto:zhouqf2...@gmail.com ]
>> *Sent:* Tuesday, January 02, 2018 10:30 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Cassandra cluster add new node slowly
>>
>> The cluster has  3 nodes,  and  the data in each node is  about 1.2 T.  I
>> want to add two new nodes to expand the cluster.
>>
>> Following the instructions from the datastax  website, ie,  (
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassand
>> ra/operations/opsAddNodeToCluster.html
>> ),
>>
>>
>> I try to add one  node  to  the cluster.  However,  it  is  too slow  and
>> time cost too  much.  After about  24 hours,  it still didn’t  success.
>>
>> I run the command: nodetool netstats  on the new node,  it  shows that:
>>
>> 

Re: Question upon gracefully restarting c* node(s)

2018-01-10 Thread Alain RODRIGUEZ
I agree with comments above. Cassandra is robust, and we are just talking
about optimising the process. Nothing mandatory. Going to an extreme I
would say you can pull and plug back the node power cable and call it a
restart, It should not harm if your cluster is properly tuned. Yet
optimisation are welcomed as they improve entropy, starting time. Plus we
are civilized operators, not barbarians, aren't we ;-)? It's just more
'clean' and efficient.
Also, historically, it was mandatory to drain when using counter to prevent
over-count as counter are not idempotent. Not sure about this nowadays).

Last time I asked this very question I ended up building this command that
I have been using since then:

`date && nodetool disablebinary && nodetool disablegossip && sleep 10 &&
nodetool flush && nodetool drain && sleep 10 && sudo service cassandra
restart`

It does the following:

- Print the date for the record
- Stop all clients transports. I never heard about a benefice of shutting
down the gossip protocol, and so never did so, it might be better but I
can't really say. This way we stop listening for clients.
- After a small while no clients are using the node, calling the drain
flushes memtables and recycle commitlog as Kurt detailed above. Here I add
a 'flush' because I haven't been that lucky in the past with drain,
sometimes not working at all, sometimes not cleaning commitlogs. I believe
flushing first makes this restart command more robust.
- Finally restart the service.

I think there is not only one good way to do this. Also, doing it wrong is
often not such a big deal.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com





2018-01-08 3:33 GMT+00:00 Jeff Jirsa :

> The sequence does have some objective benefits - especially stopping
> transports and then gossip, it tells everything you’re going offline before
> you do, so requests won’t get dropped or have to speculate to other
> replicas.
>
>
>
> --
> Jeff Jirsa
>
>
> On Jan 7, 2018, at 7:22 PM, kurt greaves  wrote:
>
> None are essential. Cassandra will gracefully shutdown in any scenario as
> long as it's not killed with a SIGKILL. However, drain does have a few
> benefits over just a normal shutdown. It will stop a few extra services
> (batchlog, compactions) and importantly it will also force recycling of
> dirty commitlog segments, meaning there will be less commitlog files to
> replay on startup and reducing startup time.
>
> A comment in the code for drain also indicates that it will wait for
> in-progress streaming to complete, but I haven't managed to find 1. where
> this occurs, or 2. if it actually differs to a normal shutdown. Note that
> this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less
> do the exact same thing, however drain will log some extra messages.
>
> On 2 January 2018 at 07:07, Jing Meng  wrote:
>
>> Hi all.
>>
>> Recently we made a change to our production env c* cluster (2.1.18) -
>> placing the commit log to the same SSD where data is stored, which needs
>> restarting all nodes.
>>
>> Before restarting a cassandra node, we ran the following nodetool utils:
>> $ nodetool disablethrift && sleep 5
>> $ nodetool disablebinary && sleep 5
>> $ nodetool disable gossip && sleep 5
>> $ nodetool drain && sleep 5
>>
>> It was "graceful" as expected (no significant errors found), but the
>> process is still a myth to us: are those commands used above "sufficient",
>> and/or why? The offical doc (docs.datastax.com) did not help with this
>> operation detail, though "nodetool drain" is apparently essential.
>>
>
>


Re: Rebuild to a new DC fails every time

2018-01-10 Thread Alain RODRIGUEZ
Hello Martin.

Did you solve your issue?

I would say that this exception could be due to
'streaming_socket_timeout_in_ms'
indeed. Make sure you have a large value enough or indeed upgrade to a
newer version implementing the keep alive is indeed an interesting thing to
try. The thing is if you are trying to add a DC, it might not be the best
moment for an upgrade. It is clear to me that using a keep-alive here is
better, so if it is a good fit upgrading could definitely help.

Another reason I can think of would be network issue of some kind such as a
flaky cross DC connection, a node going down, strictly or just bouncing
because of GC or any other reason. I believe this kind of events are not
well handled by the streaming process yet.

Is the cluster healthy overall? Do you have pending / dropped messages of
some kind, GC pressure, log warnings and errors or any other troubles?

Let us know how it goes :).

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-08 14:31 GMT+00:00 Martin Mačura :

> None of the files is listed more than once in the logs:
>
> java.lang.RuntimeException: Transfer of file
> /fs3/cassandra/data//event_group-3b5782d08e4411e68
> 42917253f111990/mc-116042-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs0/cassandra/data//event_group-3b5782d08e4411e68
> 42917253f111990/mc-111370-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs3/cassandra/data//event_alert-13d78e3f11e6a
> 6cbe1698349da4d/mc-8659-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs4/cassandra/data//event_alert-13d78e3f11e6a
> 6cbe1698349da4d/mc-9133-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs2/cassandra/data//event_alert-13d78e3f11e6a
> 6cbe1698349da4d/mc-3997-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs1/cassandra/data///event_group-3b5782d08e4411e6
> 842917253f111990/mc-152979-big-Data.db
> already completed or aborted (perhaps session failed?).
>
>
>
>
> On Mon, Jan 8, 2018 at 2:21 AM, kurt greaves  wrote:
> > If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms
> is
> > 48 hours. Appears rebuild is trying to stream the same file twice. Are
> there
> > other exceptions in the logs related to the file, or can you find out if
> > it's previously been sent by the same session? Search the logs for the
> file
> > that failed and post back any exceptions.
> >
> > On 29 December 2017 at 10:18, Martin Mačura  wrote:
> >>
> >> Is this something that can be resolved by CASSANDRA-11841 ?
> >>
> >> Thanks,
> >>
> >> Martin
> >>
> >> On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura 
> wrote:
> >> > Hi all,
> >> > we are trying to add a new datacenter to the existing cluster, but the
> >> > 'nodetool rebuild' command always fails after a couple of hours.
> >> >
> >> > We're on Cassandra 3.9.
> >> >
> >> > Example 1:
> >> >
> >> > 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:55735] 2017-12-13
> >> > 23:55:38,840 StreamResultFuture.java:174 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> >> > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB)
> >> > 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-13
> >> > 23:55:38,858 StreamResultFuture.java:174 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> >> > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB)
> >> >
> >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14
> >> > 04:28:09,064 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.io.IOException: Connection reset by peer
> >> >
> >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14
> >> > 07:26:26,832 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file
> >> > -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
> >> > already completed or aborted (perhaps session failed?).
> >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14
> >> > 07:26:50,004 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.24.16.169
> >> > 172.25.16.125 java.io.IOException: Connection reset by peer
> >> >
> >> > Example 2:
> >> >
> >> > 

Re: how to check C* partition size

2018-01-10 Thread Alain RODRIGUEZ
Hello,

You can also graph metrics using Datadog / Grafana or any other monitoring
tool. Look at the max / mean partition size I would say, see:
http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics.

There is also a metric called 'EstimatedPartitionSizeHistogram' yet it is a
gauge... I am not too sure about how to use this specific metric.

C*heers
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-08 16:47 GMT+00:00 Ahmed Eljami :

> ​>Nodetool tablestats gives you a general idea.
>
> Since C* 3.X :)
>


Calling StorageService.loadNewSSTables function results in deadlock with compaction background task.

2018-01-10 Thread Desimpel, Ignace
Tested on version 2.2.11 (but seems like trunck 3.x is still the same for the 
related code path), using nodetool refresh for restoring a snapshot

I guess the Cassandra committers can do something with this .


Calling StorageService.loadNewSSTables function results in deadlock with 
compaction background task,

because  :


From StorageService class , function public void loadNewSSTables(String ksName, 
String cfName)

a call is made to ColumnFamilyStore class , function public static synchronized 
void loadNewSSTables(String ksName, String cfName)

and then a call to Keyspace class, function public static Keyspace open(String 
keyspaceName)

getting to the function private static Keyspace open(String keyspaceName, 
Schema schema, boolean loadSSTables)

finally trying to get a lock by synchronized (Keyspace.class)


So inside the ColumnFamilyStore class lock, there is an attempt to get the lock 
on the Keyspace.class


Now at the same time I have the thread OptionalTasks executing the 
ColumnFamilyStore.getBackgroundCompactionTaskSubmitter() task.

The thread task is also calling Keyspace.open function, already progressed as 
far as getting the lock on Keyspace class.

But then the call also initializes the column families and thus is calling on 
class ColumnFamilyStore the public static synchronized ColumnFamilyStore 
createColumnFamilyStore ...


So function 1 locks A and then B

And function 2 locks B and then A

leading to deadlock


Regards,

Ignace
Daemon System Thread [RMI TCP Connection(19)-10.x.x.x] (Suspended)  
owns: Class (org.apache.cassandra.db.ColumnFamilyStore) (id=2814)
waiting for: Class (org.apache.cassandra.db.Keyspace) (id=3269)  
Keyspace.open(String, Schema, boolean) line: 110
Keyspace.open(String) line: 93  
ColumnFamilyStore.loadNewSSTables(String, String) line: 736 
StorageService.loadNewSSTables(String, String) line: 4378   
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 62  
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
Method.invoke(Object, Object...) line: 498  
Trampoline.invoke(Method, Object, Object[]) line: 71
GeneratedMethodAccessor38.invoke(Object, Object[]) line: not available  
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
Method.invoke(Object, Object...) line: 498  
MethodUtil.invoke(Method, Object, Object[]) line: 275   
StandardMBeanIntrospector.invokeM2(Method, Object, Object[], Object) 
line: 112  
StandardMBeanIntrospector.invokeM2(Object, Object, Object[], Object) 
line: 46   
StandardMBeanIntrospector(MBeanIntrospector).invokeM(M, Object, 
Object[], Object) line: 237  
PerInterface.invoke(Object, String, Object[], String[], Object) 
line: 138
StandardMBeanSupport(MBeanSupport).invoke(String, Object[], 
String[]) line: 252  
DefaultMBeanServerInterceptor.invoke(ObjectName, String, Object[], 
String[]) line: 819  
JmxMBeanServer.invoke(ObjectName, String, Object[], String[]) line: 801 
RMIConnectionImpl.doOperation(int, Object[]) line: 1468 
RMIConnectionImpl.access$300(RMIConnectionImpl, int, Object[]) line: 76 
RMIConnectionImpl$PrivilegedOperation.run() line: 1309  
RMIConnectionImpl.doPrivilegedOperation(int, Object[], Subject) line: 
1401  
RMIConnectionImpl.invoke(ObjectName, String, MarshalledObject, 
String[], Subject) line: 829 
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 62  
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
Method.invoke(Object, Object...) line: 498  
UnicastServerRef.dispatch(Remote, RemoteCall) line: 357 
Transport$1.run() line: 200 
Transport$1.run() line: 197 
AccessController.doPrivileged(PrivilegedExceptionAction, 
AccessControlContext) line: not available [native method]   
TCPTransport(Transport).serviceCall(RemoteCall) line: 196   
TCPTransport.handleMessages(Connection, boolean) line: 568  
TCPTransport$ConnectionHandler.run0() line: 826 
TCPTransport$ConnectionHandler.lambda$run$0() line: 683 
1690776575.run() line: not available
AccessController.doPrivileged(PrivilegedAction, 
AccessControlContext) line: not available [native method]
TCPTransport$ConnectionHandler.run() line: 682  
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1149  
ThreadPoolExecutor$Worker.run() line: 624   
Thread.run() line: 748  


Daemon Thread [OptionalTasks:1] (Suspended) 
owns: Class 

Re: C* Logs to Kibana

2018-01-10 Thread Nicolas Guyomar
Hi,

I believe you can use Logstash to parse C* logs, using some grok pattern
like those :
https://gist.github.com/ibspoof/917a888adb08a819eab7163b97e018cb so that
you gain some nice insight of what your cluster is really doing !

It feel more "native" than to add some jar in C* lib in order to change
logging behavior, and it will be easier for you to post some log on this ML
if you keep the original format :)

On 10 January 2018 at 11:07, shalom sagges  wrote:

> Hi All,
>
> I want to push the Cassandra logs (version 3.x) to Kibana.
> Is there a way to configure the Cassandra logs to be in json format?
>
> If modifying the logs to json is not an option, I came across this blog
> post from about a year ago regarding that matter:
> https://medium.com/@alain.rastoul/pushing-cassandra-
> logs-into-elasticsearch-9be3b52af754
>
> Is that a good way of accomplishing that?
>
> Thanks!
>


sstabledump tries to delete a file

2018-01-10 Thread Python_Max
Hello all.

I have an error when trying to dump SSTable (Cassandra 3.11.1):

$ sstabledump mc-56801-big-Data.db
Exception in thread "main" FSWriteError in
/var/lib/cassandra/data///mc-56801-big-Summary.db
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:142)
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:159)
at
org.apache.cassandra.io.sstable.format.SSTableReader.saveSummary(SSTableReader.java:935)
at
org.apache.cassandra.io.sstable.format.SSTableReader.saveSummary(SSTableReader.java:920)
at
org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:788)
at
org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:731)
at
org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:516)
at
org.apache.cassandra.io.sstable.format.SSTableReader.openNoValidation(SSTableReader.java:396)
at
org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:191)
Caused by: java.nio.file.AccessDeniedException:
/var/lib/cassandra/data///mc-56801-big-Summary.db
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
at
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:136)
... 8 more

Seems that sstabledump tries to delete and recreate summary file which I
think is risky because external modification to files that should be
modified only by Cassandra itself can lead to unpredictable behavior.
When I copy all related files and change it's owner to myself and run
sstabledump in that directory then Summary.db file is recreated but it's
md5 is exactly the same as original Summary.db's file.

I indeed have changed bloom_filter_fp_chance couple months ago, so I
believe that's the reason why SSTableReader wants to recreate summary file.

After nodetool scrub an error still happens.

I have not found any issues like this in bug tracker.
Shouldn't sstabledump be read only?

-- 
Best regards,
Python_Max.


C* Logs to Kibana

2018-01-10 Thread shalom sagges
Hi All,

I want to push the Cassandra logs (version 3.x) to Kibana.
Is there a way to configure the Cassandra logs to be in json format?

If modifying the logs to json is not an option, I came across this blog
post from about a year ago regarding that matter:
https://medium.com/@alain.rastoul/pushing-cassandra-logs-into-elasticsearch-9be3b52af754

Is that a good way of accomplishing that?

Thanks!


RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Steinmaurer, Thomas
m4.xlarge do have PCID to my knowledge, but possibly we need a rather new 
kernel 4.14. But I fail to see how this could help anyway, cause this looks 
highly Amazon Hypervisor patch related and we do not have the production 
instances patched at OS/VM level (yet).

Thomas

From: Dor Laor [mailto:d...@scylladb.com]
Sent: Dienstag, 09. Jänner 2018 19:30
To: user@cassandra.apache.org
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Make sure you pick instances with PCID cpu capability, their TLB overhead flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
> 
wrote:
Quick follow up.

Others in AWS reporting/seeing something similar, e.g.: 
https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure …

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313