JBOD and system keyspaces

2015-09-23 Thread Stuart Bishop
Hi.

I'm setting up a new cluster (DSE 4.7), and interested in failure
modes of JBOD setups. My main concern is the likelihood of needing to
rebuild an entire node whenever a single drive goes wobbly. After I
have replaced a failed or failing disk, how likely is it that it
contained critical information and the node unable to restart and
rejoin the cluster?

My current thinking is that the commitlog_directory and
savedcaches_directory should be on my OS RAID0 partition (just a pair
of small disks), with my data_file_directories pointing to my large
drives and not using RAID. Is this a good idea or a terrible one?

My alternative is to pay the capacity and write bandwidth penalties
and go RAID5, with the advantage that drives can be swapped out by
data center engineers without needing to shut the node down and
without lowered redundancy for several hours while the repair
completes.

-- 
Stuart Bishop 
http://www.stuartbishop.net/


RE: High read latency

2015-09-23 Thread Leleu Eric
For read  heavy workload,  JVM GC can cause latency issue. (see 
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads)
If you have frequent minor GC taking 400ms, it may increase your read latency.

Eric

De : Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com]
Envoyé : mardi 22 septembre 2015 19:50
À : user@cassandra.apache.org
Objet : Re: High read latency

select * from test where a = ? and b = ?

On Tue, Sep 22, 2015 at 10:27 AM, sai krishnam raju potturi 
> wrote:
thanks for the information. Posting the query too would be of help.

On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia 
> wrote:

Please find required details here:

-  Number of req/s

2k reads/s

-  Schema details

create table test {

a timeuuid,

b bigint,

c int,

d int static,

e int static,

f int static,

g int static,

h int,

i text,

j text,

k text,

l text,

m set

n bigint

o bigint

p bigint

q bigint

r int

s text

t bigint

u text

v text

w text

x bigint

y bigint

z bigint,

primary key ((a, b), c)

};

-  JVM settings about the heap

Default settings

-  Execution time of the GC

Avg. 400ms. I do not see long pauses of GC anywhere in the log file.

On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric 
> wrote:
Hi,


Before speaking about tuning, can you provide some additional information ?


-  Number of req/s

-  Schema details

-  JVM settings about the heap

-  Execution time of the GC

43ms for a read latency may be acceptable according to the number of request 
per second.


Eric

De : Jaydeep Chovatia 
[mailto:chovatia.jayd...@gmail.com]
Envoyé : mardi 22 septembre 2015 00:07
À : user@cassandra.apache.org
Objet : High read latency

Hi,

My application issues more read requests than write, I do see that under load 
cfstats for one of the table is quite high around 43ms

Local read count: 114479357
Local read latency: 43.442 ms
Local write count: 22288868
Local write latency: 0.609 ms


Here is my node configuration:
RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of data 
on each node (and for experiment purpose I stored data in tmpfs)

I've tried increasing concurrent_read count upto 512 but no help in read 
latency. CPU/Memory/IO looks fine on system.

Any idea what should I tune?

Jaydeep



Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.






Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to 

Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-23 Thread Marcelo Valle (BLOOMBERG/ LONDON)
I think there is a very important point in Scylladb - latency. 
Performance can be an important requirement, but the fact scylladb is written 
in C and uses lock free algorithms inside means it should have lower latency 
than Cassandra, which enables it's use for a wider range of applications. 
It seems like a huge milestone achieved by Cassandra community, congratulations!

From: user@cassandra.apache.org 
Subject: Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

Looking at the architecture and what scylladb does, I'm not surprised they got 
10x improvement. SeaStar skips a lot of the overhead of copying stuff and it 
gives them CPU core affinity. Anyone that's listened to Clif Click talk about 
cache misses, locks and other low level stuff would recognize the huge boost in 
performance when many of those bottlenecks are removed. Using an actor model to 
avoid locks doesn't hurt either.

On Tue, Sep 22, 2015 at 5:20 PM, Minh Do  wrote:

First glance at their github, it looks like they re-implemented Cassandra in 
C++.  90% components in Cassandra are
in scylladb, i.e. compaction, repair, CQL, gossip, SStable.


With C++, I believe this helps performance to some extent up to a point when 
compaction has not run yet.  
Then, it will be disk IO to be the dominant factor in the performance 
measurement as the more traffics to a node the more degrading
the performance is across the cluster.

Also, they only support Thrift protocol so it won't work with Java Driver with 
the new asynchronous protocol.  I doubt their tests 
are truly a fair one.

On Tue, Sep 22, 2015 at 2:13 PM, Venkatesh Arivazhagan  
wrote:

I came across this article: 
zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/

Tzach, I would love to know/understand moree about ScyllaDB too. Also the 
benchmark seems to have only 1 DB Server. Do you have benchmark numbers where 
more than 1 DB servers were involved? :)


On Tue, Sep 22, 2015 at 1:40 PM, Sachin Nikam  wrote:

Tzach,
Can you point to any documentation on scylladb site which talks about how/why 
scylla db performs better than Cassandra while using the same architecture?
Regards
Sachin

On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan  
wrote:

Hello Cassandra users,

We are pleased to announce a new member of the Cassandra Ecosystem - ScyllaDB
ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store, written 
with the goal of delivering superior performance and consistent low latency.  
Today, ScyllaDB runs 1M tps per server with sub 1ms latency.

ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works out of 
the box with Cassandra tools like cqlsh, Spark connector, nodetool and 
cassandra-stress. ScyllaDB is a drop-in replacement solution for the Cassandra 
server side packages.

Scylla is implemented using the new shared-nothing Seastar framework for 
extreme performance on modern multicore hardware, and the Data Plane 
Development Kit (DPDK) for high-speed low-latency networking.

Try Scylla Now - http://www.scylladb.com

We will be at Cassandra summit 2015, you are welcome to visit our booth to hear 
more and see a demo.
Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM - 2:30 
PM in rooms M1 - M3.

Regards
Tzach
scylladb


<< ideas dont deserve respect >>

Re: Cassandra Summit 2015 Roll Call!

2015-09-23 Thread Sebastian Estevez
Hey guys, find me at the Startups booth! Looking forward to meeting some of
you in person :)
On Sep 22, 2015 8:44 PM, "Steve Robenalt"  wrote:

> I am here. Wearing my assorted Cassandra shirts from meetups and
> conferences. Would be happy to meet anyone from this mailing list because
> the conversations here have been very valuable as I have ramped up with
> Cassandra. And I passed my developer certification today. :-)  I am
> identifiable from my LinkedIn picture.
>
> Steve
>
>
>
> On Tue, Sep 22, 2015 at 8:19 PM, Mohammed Guller 
> wrote:
>
>> Hey everyone,
>>
>> I will be at the summit too on Wed and Thu.  I am giving a talk on
>> Thursday at 2.40pm.
>>
>>
>>
>> Would love to meet everyone on this list in person.  Here is an old
>> picture of mine:
>>
>>
>> https://events.mfactormeetings.com/accounts/register123/mfactor/datastax/events/dstaxsummit2015/guller.jpg
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
>> *Sent:* Tuesday, September 22, 2015 5:23 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra Summit 2015 Roll Call!
>>
>>
>>
>> Hi guys.
>>
>>
>>
>> I'm already here and I'll be the whole Summit. I'll be doing a live demo
>> on Thursday on troubleshooting Cassandra production issues as a developer.
>>
>>
>>
>> This is me!! https://twitter.com/calonso/status/646352711454097408
>>
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>>
>>
>> On 22 September 2015 at 15:27, Jeff Jirsa 
>> wrote:
>>
>> I’m here. Will be speaking Wednesday on DTCS for time series workloads:
>> http://cassandrasummit-datastax.com/agenda/real-world-dtcs-for-operators/
>>
>>
>>
>> Picture if you recognize me, say hi:
>> https://events.mfactormeetings.com/accounts/register123/mfactor/datastax/events/dstaxsummit2015/jirsa.jpg
>>  (probably
>> wearing glasses and carrying a black Crowdstrike backpack)
>>
>>
>>
>> - Jeff
>>
>>
>>
>>
>>
>> *From: *Robert Coli
>> *Reply-To: *"user@cassandra.apache.org"
>> *Date: *Tuesday, September 22, 2015 at 11:27 AM
>> *To: *"user@cassandra.apache.org"
>> *Subject: *Cassandra Summit 2015 Roll Call!
>>
>>
>>
>> Cassandra Summit 2015 is upon us!
>>
>>
>>
>> Every year, the conference gets bigger and bigger, and the chance of IRL
>> meeting people you've "met" online gets smaller and smaller.
>>
>>
>>
>> To improve everyone's chances, if you are attending the summit :
>>
>>
>>
>> 1) respond on-thread with a brief introduction (and physical description
>> of yourself if you want others to be able to spot you!)
>>
>> 2) join #cassandra on freenode IRC (irc.freenode.org) to chat and
>> connect with other attendees!
>>
>>
>>
>> MY CONTRIBUTION :
>>
>> --
>>
>> I will be at the summit on Wednesday and Thursday. I am 5'8" or so, and
>> will be wearing glasses and either a red or blue "Eventbrite Engineering"
>> t-shirt with a graphic logo of gears on it. Come say hello! :D
>>
>>
>>
>> =Rob
>>
>>
>>
>>
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>


Re: Do vnodes need more memory?

2015-09-23 Thread Sebastian Estevez
This is interesting, where are you seeing that you're collecting 50% of the
time? Is your env.sh the default? How much ram?

Also, can you run this tool and send a minute worth of thread info:

wget
https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar
java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU
On Sep 23, 2015 7:09 AM, "Tom van den Berge" 
wrote:

> I have two data centers, each with the same number of nodes, same hardware
> (CPUs, memory), Cassandra version (2.1.6), replication factory, etc. The
> only difference it that one data center uses vnodes, and the other doesn't.
>
> The non-vnode DC works fine (and has been for a long time) under
> production load: I'm seeing normal CPU and IO load and garbage collection
> figures. But the vnode DC is struggling very hard under the same load. It
> has been set up recently. The CPU load is very high, due to excessive
> garbage collection (>50% of the time is spent collecting).
>
> So it seems that Cassandra simply doesn't have enough memory. I'm trying
> to understand if this can be caused by the use of vnodes? Is there an
> sensible reason why vnodes would consume more memory than regular nodes? Or
> does any of you have the same experience? If not, I might be barking up the
> wrong tree here, and I would love to know it before upgrading my servers
> with more memory.
>
> Thanks,
> Tom
>


Re: Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
nodetool gcstat tells me this (the Total GC Elapsed is half or more of the
Interval).

We had to take the production load off the new vnode DC, since it was
messing things up badly. It means I'm not able to run any tools against it
at the moment.
The env.sh is default, and the servers have 8G ram.

It would be great if you could respond to my initial question though.
Thanks,
Tom

On Wed, Sep 23, 2015 at 4:14 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> This is interesting, where are you seeing that you're collecting 50% of
> the time? Is your env.sh the default? How much ram?
>
> Also, can you run this tool and send a minute worth of thread info:
>
> wget
> https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar
> java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU
> On Sep 23, 2015 7:09 AM, "Tom van den Berge" 
> wrote:
>
>> I have two data centers, each with the same number of nodes, same
>> hardware (CPUs, memory), Cassandra version (2.1.6), replication factory,
>> etc. The only difference it that one data center uses vnodes, and the other
>> doesn't.
>>
>> The non-vnode DC works fine (and has been for a long time) under
>> production load: I'm seeing normal CPU and IO load and garbage collection
>> figures. But the vnode DC is struggling very hard under the same load. It
>> has been set up recently. The CPU load is very high, due to excessive
>> garbage collection (>50% of the time is spent collecting).
>>
>> So it seems that Cassandra simply doesn't have enough memory. I'm trying
>> to understand if this can be caused by the use of vnodes? Is there an
>> sensible reason why vnodes would consume more memory than regular nodes? Or
>> does any of you have the same experience? If not, I might be barking up the
>> wrong tree here, and I would love to know it before upgrading my servers
>> with more memory.
>>
>> Thanks,
>> Tom
>>
>


Huge amounts of hinted handoffs for counter table

2015-09-23 Thread Björn Hachmann
Today I realized that one of the nodes in our Cassandra cluster (2.1.7) is
storing a lot of hints (>80GB) and I fail to see a convincing way to deal
with them.

>From the system.log:
INFO  [ScheduledTasks:1] 2015-09-23 14:27:06,692 StatusLogger.java:115 -
system.hints  276,1010945
INFO  [ScheduledTasks:1] 2015-09-23 14:38:06,722 StatusLogger.java:115 -
system.hints  968,2968163
INFO  [ScheduledTasks:1] 2015-09-23 14:38:41,742 StatusLogger.java:115 -
system.hints 1317,3799471
INFO  [ScheduledTasks:1] 2015-09-23 14:49:16,775 StatusLogger.java:115 -
system.hints 1519,4399905
INFO  [ScheduledTasks:1] 2015-09-23 14:49:36,793 StatusLogger.java:115 -
system.hints 2247,6514649
INFO  [ScheduledTasks:1] 2015-09-23 14:49:41,811 StatusLogger.java:115 -
system.hints 2247,6514649
INFO  [ScheduledTasks:1] 2015-09-23 14:49:51,830 StatusLogger.java:115 -
system.hints 2368,6733293
INFO  [ScheduledTasks:1] 2015-09-23 15:00:41,885 StatusLogger.java:115 -
system.hints283,450166810
INFO  [ScheduledTasks:1] 2015-09-23 15:12:16,919 StatusLogger.java:115 -
system.hints   232,970964
INFO  [ScheduledTasks:1] 2015-09-23 15:12:31,934 StatusLogger.java:115 -
system.hints  581,2034388
INFO  [ScheduledTasks:1] 2015-09-23 15:23:46,973 StatusLogger.java:115 -
system.hints   234,321566
INFO  [ScheduledTasks:1] 2015-09-23 15:24:01,988 StatusLogger.java:115 -
system.hints   368,935634
INFO  [ScheduledTasks:1] 2015-09-23 15:35:12,039 StatusLogger.java:115 -
system.hints   264,636164

The state of the cluster seems stable, at least we do not have any
downtimes (sometimes the load on one of the nodes is quite high).

We had a look into the table system.hints and from there we learnt that
most hints
are for one of the nodes in our 2nd datacenter and most of the mutations
are
increments to one of our counter tables which are very frequent.

There seem to be no other suspicious log messages in the log apart from a
few dropped events.

We have several questions:
- What could be the reason that only one of the nodes has hints for only
one target node, altough every other node should be coordinator for these
queries sometimes also?
- Is there a way to turn of hinted handoff on a table level or on data
center level?
- What could we do to investigate the cause of this issue deeper?

Thank you!
Kind regards
Björn Hachmann


Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-23 Thread Peter Lin
Looking at the architecture and what scylladb does, I'm not surprised they
got 10x improvement. SeaStar skips a lot of the overhead of copying stuff
and it gives them CPU core affinity. Anyone that's listened to Clif Click
talk about cache misses, locks and other low level stuff would recognize
the huge boost in performance when many of those bottlenecks are removed.
Using an actor model to avoid locks doesn't hurt either.

On Tue, Sep 22, 2015 at 5:20 PM, Minh Do  wrote:

> First glance at their github, it looks like they re-implemented Cassandra
> in C++.  90% components in Cassandra are
> in scylladb, i.e. compaction, repair, CQL, gossip, SStable.
>
>
> With C++, I believe this helps performance to some extent up to a point
> when compaction has not run yet.
> Then, it will be disk IO to be the dominant factor in the performance
> measurement as the more traffics to a node the more degrading
> the performance is across the cluster.
>
> Also, they only support Thrift protocol so it won't work with Java Driver
> with the new asynchronous protocol.  I doubt their tests
> are truly a fair one.
>
> On Tue, Sep 22, 2015 at 2:13 PM, Venkatesh Arivazhagan <
> venkey.a...@gmail.com> wrote:
>
>> I came across this article:
>> zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/
>>
>> Tzach, I would love to know/understand moree about ScyllaDB too. Also the
>> benchmark seems to have only 1 DB Server. Do you have benchmark numbers
>> where more than 1 DB servers were involved? :)
>>
>>
>> On Tue, Sep 22, 2015 at 1:40 PM, Sachin Nikam  wrote:
>>
>>> Tzach,
>>> Can you point to any documentation on scylladb site which talks about
>>> how/why scylla db performs better than Cassandra while using the same
>>> architecture?
>>> Regards
>>> Sachin
>>>
>>> On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan <
>>> tz...@cloudius-systems.com> wrote:
>>>
 Hello Cassandra users,

 We are pleased to announce a new member of the Cassandra Ecosystem -
 ScyllaDB
 ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
 written with the goal of delivering superior performance and consistent low
 latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.

 ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works
 out of the box with Cassandra tools like cqlsh, Spark connector, nodetool
 and cassandra-stress. ScyllaDB is a drop-in replacement solution for the
 Cassandra server side packages.

 Scylla is implemented using the new shared-nothing Seastar
  framework for extreme performance on
 modern multicore hardware, and the Data Plane Development Kit (DPDK) for
 high-speed low-latency networking.

 Try Scylla Now - http://www.scylladb.com

 We will be at Cassandra summit 2015, you are welcome to visit our booth
 to hear more and see a demo.
 Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM
 - 2:30 PM in rooms M1 - M3.

 Regards
 Tzach
 scylladb


>>>
>>
>


Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
I have two data centers, each with the same number of nodes, same hardware
(CPUs, memory), Cassandra version (2.1.6), replication factory, etc. The
only difference it that one data center uses vnodes, and the other doesn't.

The non-vnode DC works fine (and has been for a long time) under production
load: I'm seeing normal CPU and IO load and garbage collection figures. But
the vnode DC is struggling very hard under the same load. It has been set
up recently. The CPU load is very high, due to excessive garbage collection
(>50% of the time is spent collecting).

So it seems that Cassandra simply doesn't have enough memory. I'm trying to
understand if this can be caused by the use of vnodes? Is there an sensible
reason why vnodes would consume more memory than regular nodes? Or does any
of you have the same experience? If not, I might be barking up the wrong
tree here, and I would love to know it before upgrading my servers with
more memory.

Thanks,
Tom


Re: Huge amounts of hinted handoffs for counter table

2015-09-23 Thread Venkatesh Arivazhagan
What is your replication factor and write consistency? :)
On Sep 23, 2015 7:28 AM, "Björn Hachmann" 
wrote:

> Today I realized that one of the nodes in our Cassandra cluster (2.1.7) is
> storing a lot of hints (>80GB) and I fail to see a convincing way to deal
> with them.
>
> From the system.log:
> INFO  [ScheduledTasks:1] 2015-09-23 14:27:06,692 StatusLogger.java:115 -
> system.hints  276,1010945
> INFO  [ScheduledTasks:1] 2015-09-23 14:38:06,722 StatusLogger.java:115 -
> system.hints  968,2968163
> INFO  [ScheduledTasks:1] 2015-09-23 14:38:41,742 StatusLogger.java:115 -
> system.hints 1317,3799471
> INFO  [ScheduledTasks:1] 2015-09-23 14:49:16,775 StatusLogger.java:115 -
> system.hints 1519,4399905
> INFO  [ScheduledTasks:1] 2015-09-23 14:49:36,793 StatusLogger.java:115 -
> system.hints 2247,6514649
> INFO  [ScheduledTasks:1] 2015-09-23 14:49:41,811 StatusLogger.java:115 -
> system.hints 2247,6514649
> INFO  [ScheduledTasks:1] 2015-09-23 14:49:51,830 StatusLogger.java:115 -
> system.hints 2368,6733293
> INFO  [ScheduledTasks:1] 2015-09-23 15:00:41,885 StatusLogger.java:115 -
> system.hints283,450166810
> INFO  [ScheduledTasks:1] 2015-09-23 15:12:16,919 StatusLogger.java:115 -
> system.hints   232,970964
> INFO  [ScheduledTasks:1] 2015-09-23 15:12:31,934 StatusLogger.java:115 -
> system.hints  581,2034388
> INFO  [ScheduledTasks:1] 2015-09-23 15:23:46,973 StatusLogger.java:115 -
> system.hints   234,321566
> INFO  [ScheduledTasks:1] 2015-09-23 15:24:01,988 StatusLogger.java:115 -
> system.hints   368,935634
> INFO  [ScheduledTasks:1] 2015-09-23 15:35:12,039 StatusLogger.java:115 -
> system.hints   264,636164
>
> The state of the cluster seems stable, at least we do not have any
> downtimes (sometimes the load on one of the nodes is quite high).
>
> We had a look into the table system.hints and from there we learnt that
> most hints
> are for one of the nodes in our 2nd datacenter and most of the mutations
> are
> increments to one of our counter tables which are very frequent.
>
> There seem to be no other suspicious log messages in the log apart from a
> few dropped events.
>
> We have several questions:
> - What could be the reason that only one of the nodes has hints for only
> one target node, altough every other node should be coordinator for these
> queries sometimes also?
> - Is there a way to turn of hinted handoff on a table level or on data
> center level?
> - What could we do to investigate the cause of this issue deeper?
>
> Thank you!
> Kind regards
> Björn Hachmann
>


Re: Unable to remove dead node from cluster.

2015-09-23 Thread Jeff Jirsa
When you run unsafeAssassinateEndpoint, to which host are you connected, and 
what argument are you passing?

Are there other nodes in the ring that you’re not including in the ‘nodetool 
status’ output?


From:  Dikang Gu
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, September 22, 2015 at 10:09 PM
To:  cassandra
Cc:  "d...@cassandra.apache.org"
Subject:  Re: Unable to remove dead node from cluster.

ping.

On Mon, Sep 21, 2015 at 11:51 AM, Dikang Gu  wrote:
I have tried all of them, neither of them worked. 
1. decommission: the host had hardware issue, and I can not connect to it.
2. remove, there is not HostID, so the removenode did not work.
3. unsafeAssassinateEndpoint, it will throw NPE as I pasted before, can we fix 
it?

Thanks
Dikang.

On Mon, Sep 21, 2015 at 11:11 AM, Sebastian Estevez 
 wrote:

Order is decommission, remove, assassinate.

Which have you tried?

On Sep 21, 2015 10:47 AM, "Dikang Gu"  wrote:
Hi there, 

I have a dead node in our cluster, which is a wired state right now, and can 
not be removed from cluster.

The nodestatus shows:
Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  OwnsHost ID 
  Rack
DN  10.210.165.55?  256 ?   null
  r1

I tried the unsafeAssassinateEndpoint, but got exception like:
2015-09-18_23:21:40.79760 INFO  23:21:40 InetAddress /10.210.165.55 is now DOWN
2015-09-18_23:21:40.80667 ERROR 23:21:40 Exception in thread 
Thread[GossipStage:1,5,main]
2015-09-18_23:21:40.80668 java.lang.NullPointerException: null
2015-09-18_23:21:40.80669   at 
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1584)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80669   at 
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1592)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80670   at 
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80671   at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495) 
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80671   at 
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121) 
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80672   at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) 
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113) 
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at 
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80674   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_45]
2015-09-18_23:21:40.80674   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_45]
2015-09-18_23:21:40.80674   at java.lang.Thread.run(Thread.java:744) 
~[na:1.7.0_45]
2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to local 
pause of 10852378435 > 50

Any suggestions about how to remove it?
Thanks.

-- 
Dikang




-- 
Dikang




-- 
Dikang




smime.p7s
Description: S/MIME cryptographic signature


Re: Huge amounts of hinted handoffs for counter table

2015-09-23 Thread Robert Coli
On Wed, Sep 23, 2015 at 7:28 AM, Björn Hachmann 
wrote:

> Today I realized that one of the nodes in our Cassandra cluster (2.1.7) is
> storing a lot of hints (>80GB) and I fail to see a convincing way to deal
> with them.
> ...
> We had a look into the table system.hints and from there we learnt that
> most hints
> are for one of the nodes in our 2nd datacenter and most of the mutations
> are
> increments to one of our counter tables which are very frequent.
>

This is probably timeouts on the increment creating your hints.


> We have several questions:
> - What could be the reason that only one of the nodes has hints for only
> one target node, altough every other node should be coordinator for these
> queries sometimes also?
>

That sounds unexpected, I don't have a good answer.


> - Is there a way to turn of hinted handoff on a table level or on data
> center level?
>

No.


> - What could we do to investigate the cause of this issue deeper?
>

Are the hints being successfully delivered? It sounds like not..

=Rob


Re: Do vnodes need more memory?

2015-09-23 Thread Robert Coli
On Wed, Sep 23, 2015 at 7:09 AM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> So it seems that Cassandra simply doesn't have enough memory. I'm trying
> to understand if this can be caused by the use of vnodes? Is there an
> sensible reason why vnodes would consume more memory than regular nodes? Or
> does any of you have the same experience? If not, I might be barking up the
> wrong tree here, and I would love to know it before upgrading my servers
> with more memory.
>

Yes, range ownership has a RAM/heap cost per-range-owned. This cost is paid
during many, but not all, operations. Owning 256 ranges > Owning 1 range.

I have not had the same experience but am not at all surprised to hear that
vnodes increase heap consumption for otherwise identical configurations. I
am surprised to hear that it makes a significant difference in GC time, but
you might have been close enough to heap saturation that vnodes tip you
over.

As an aside, one is likely to win very little net win from vnodes if one's
cluster is not now and will never be more than approximately 15 nodes.

=Rob


Re: Upgrade Limitations Question

2015-09-23 Thread Robert Coli
On Wed, Sep 16, 2015 at 7:02 AM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

>
> At the end we had to wait for the upgradesstables ti finish on every node.
> Just to eliminate the possibility of this being the reason of any weird
> behaviour after the upgrade. However, this process might take a long time
> in a cluster with a large number of nodes which means no new work can be
> done for that period.
>

Yes, this is the worst case scenario and it's pretty bad for large clusters
/ large data-size per node.

1) TRUNCATE requires all known nodes to be available to succeed, if you are
>> restarting one, it won't be available.
>>
>
> I suppose all means all, not all replicas here, is that right? Not
> directly related to the original question, but that might explain why we
> end up with peculiar behaviour some times when we run TRUNCATE. We've now
> taken the approach DROP it and do it again when possible (even though this
> is still problematic when using the same CF name)
>

Pretty sure that TRUNCATE and DROP have the same behavior wrt node
availability. Yes, I mean all nodes which are supposed to replicate that
table.


> Is there a way to find out if the upgradesstables has been run against a
>> particular node or not?
>>
>
If you run it and it immediately completes [1], it has probably been run
before.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5366 - 1.2.4 - "NOOP on
upgradesstables for already upgraded node"