Re: Cluster Maintenance Mishap

2016-10-20 Thread Yabin Meng
I believe you're using VNodes (because token range change doesn't make
sense for single-token setup unless you change it explicitly). If you
bootstrap a new node with VNodes, I think the way that the token ranges are
assigned to the node is random (I'm not 100% sure here, but should be so
logically). If so, the ownership of the data that each node is responsible
for will be changed. The part of the data that doesn't belong to the node
under the new ownership, however, will still be kept on that node.
Cassandra won't remove it automatically unless you run "nodetool cleanup".
So to answer your question, I don't think the data have been moved away.
More likely you have extra duplicate here :

Yabin

On Thu, Oct 20, 2016 at 6:41 PM, Branton Davis <branton.da...@spanning.com>
wrote:

> Thanks for the response, Yabin.  However, if there's an answer to my
> question here, I'm apparently too dense to see it ;)
>
> I understand that, since the system keyspace data was not there, it
> started bootstrapping.  What's not clear is if they took over the token
> ranges of the previous nodes or got new token ranges.  I'm mainly
> concerned about the latter.  We've got the nodes back in place with the
> original data, but the fear is that some data may have been moved off of
> other nodes.  I think that this is very unlikely, but I'm just looking for
> confirmation.
>
>
> On Thursday, October 20, 2016, Yabin Meng <yabinm...@gmail.com> wrote:
>
>> Most likely the issue is caused by the fact that when you move the data,
>> you move the system keyspace data away as well. Meanwhile, due to the error
>> of data being copied into a different location than what C* is expecting,
>> when C* starts, it can not find the system metadata info and therefore
>> tries to start as a fresh new node. If you keep keyspace data in the right
>> place, you should see all old info. as expected.
>>
>> I've seen a few such occurrences from customers. As a best practice, I
>> would always suggest to totally separate Cassandra application data
>> directory from system keyspace directory (e.g. they don't share common
>> parent folder, and such).
>>
>> Regards,
>>
>> Yabin
>>
>> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> Howdy folks.  I asked some about this in IRC yesterday, but we're
>>> looking to hopefully confirm a couple of things for our sanity.
>>>
>>> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
>>> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
>>> across 3 AZs on AWS EC2).  The plan was to swap each node's existing
>>> 1TB volume (where all cassandra data, including the commitlog, is stored)
>>> with a 2TB volume.  The plan for each node (one at a time) was
>>> basically:
>>>
>>>- rsync while the node is live (repeated until there were only minor
>>>differences from new data)
>>>- stop cassandra on the node
>>>- rsync again
>>>- replace the old volume with the new
>>>- start cassandra
>>>
>>> However, there was a bug in the rsync command.  Instead of copying the
>>> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
>>> /var/data/cassandra_new/cassandra.  So, when cassandra was started
>>> after the volume swap, there was some behavior that was similar to
>>> bootstrapping a new node (data started streaming in from other nodes).  But
>>> there was also some behavior that was similar to a node replacement
>>> (nodetool status showed the same IP address, but a different host ID).  This
>>> happened with 3 nodes (one from each AZ).  The nodes had received
>>> 1.4GB, 1.2GB, and 0.6GB of data (whereas the normal load for a node is
>>> around 500-600GB).
>>>
>>> The cluster was in this state for about 2 hours, at which
>>> point cassandra was stopped on them.  Later, I moved the data from the
>>> original volumes back into place (so, should be the original state before
>>> the operation) and started cassandra back up.
>>>
>>> Finally, the questions.  We've accepted the potential loss of new data
>>> within the two hours, but our primary concern now is what was happening
>>> with the bootstrapping nodes.  Would they have taken on the token
>>> ranges of the original nodes or acted like new nodes and got new token
>>> ranges?  If the latter, is it possible that any data moved from the
>>> healthy nodes to the "new" nodes or would restarting them with the original
>>> data (and repairing) put the cluster's token ranges back into a normal
>>> state?
>>>
>>> Hopefully that was all clear.  Thanks in advance for any info!
>>>
>>
>>


Re: Rebuild failing while adding new datacenter

2016-10-20 Thread Yabin Meng
Sorry, I'm not aware of it

On Thu, Oct 20, 2016 at 6:00 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thank you Yabin, is there a exisiting JIRA that I can refer to?
>
> On Thu, Oct 20, 2016 at 2:05 PM, Yabin Meng <yabinm...@gmail.com> wrote:
>
>> I have seen this on other releases, on 2.2.x. The workaround is exactly
>> like yours,  some other system keyspaces also need similar changes.
>>
>> I would say this is a benign bug.
>>
>> Yabin
>>
>> On Thu, Oct 20, 2016 at 4:41 PM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> thanks,
>>>
>>> This always works on 2.1.13 and 2.1.16 version but not on 3.0.8.
>>> definitely not a firewall issue
>>>
>>> On Thu, Oct 20, 2016 at 1:16 PM, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>>> we faced a similar issue earlier, but that was more related to firewall
>>>> rules. The newly added datacenter was not able to communicate with the
>>>> existing datacenters on the port 7000(inter-node communication). Your's
>>>> might be a different issue, but just saying.
>>>>
>>>>
>>>> On Thu, Oct 20, 2016 at 4:12 PM, Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have single datacenter with 3 C* nodes and we are trying to expand
>>>>> the cluster to another region/DC. I am seeing the below error while doing 
>>>>> a
>>>>> "nodetool rebuild -- name_of_existing_data_center" .
>>>>>
>>>>> [user@machine ~]$ nodetool rebuild DC1
>>>>> nodetool: Unable to find sufficient sources for streaming range
>>>>> (-402178150752044282,-396707578307430827] in keyspace
>>>>> system_distributed
>>>>> See 'nodetool help' or 'nodetool help '.
>>>>> [user@machine ~]$
>>>>>
>>>>> user@cqlsh> SELECT * from system_schema.keyspaces where
>>>>> keyspace_name='system_distributed';
>>>>>
>>>>>  keyspace_name | durable_writes | replication
>>>>> ---++---
>>>>> --
>>>>>  system_distributed |   True | {'class':
>>>>> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor':
>>>>> '3'}
>>>>>
>>>>> (1 rows)
>>>>>
>>>>> To overcome this I have updated system_distributed keyspace to DC1:3
>>>>> and DC2:3 with NetworkTopologyStrategy
>>>>>
>>>>> C* Version - 3.0.8
>>>>>
>>>>> Is this a bug that is introduced in 3.0.8 version of cassandra? as I
>>>>> haven't seen this issue with the older versions?
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread Yabin Meng
Most likely the issue is caused by the fact that when you move the data,
you move the system keyspace data away as well. Meanwhile, due to the error
of data being copied into a different location than what C* is expecting,
when C* starts, it can not find the system metadata info and therefore
tries to start as a fresh new node. If you keep keyspace data in the right
place, you should see all old info. as expected.

I've seen a few such occurrences from customers. As a best practice, I
would always suggest to totally separate Cassandra application data
directory from system keyspace directory (e.g. they don't share common
parent folder, and such).

Regards,

Yabin

On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis 
wrote:

> Howdy folks.  I asked some about this in IRC yesterday, but we're looking
> to hopefully confirm a couple of things for our sanity.
>
> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
> volume (where all cassandra data, including the commitlog, is stored) with
> a 2TB volume.  The plan for each node (one at a time) was basically:
>
>- rsync while the node is live (repeated until there were only minor
>differences from new data)
>- stop cassandra on the node
>- rsync again
>- replace the old volume with the new
>- start cassandra
>
> However, there was a bug in the rsync command.  Instead of copying the
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
> the volume swap, there was some behavior that was similar to bootstrapping
> a new node (data started streaming in from other nodes).  But there
> was also some behavior that was similar to a node replacement (nodetool
> status showed the same IP address, but a different host ID).  This
> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
> 500-600GB).
>
> The cluster was in this state for about 2 hours, at which point cassandra
> was stopped on them.  Later, I moved the data from the original volumes
> back into place (so, should be the original state before the operation) and
> started cassandra back up.
>
> Finally, the questions.  We've accepted the potential loss of new data
> within the two hours, but our primary concern now is what was happening
> with the bootstrapping nodes.  Would they have taken on the token ranges
> of the original nodes or acted like new nodes and got new token ranges?  If
> the latter, is it possible that any data moved from the healthy nodes to
> the "new" nodes or would restarting them with the original data (and
> repairing) put the cluster's token ranges back into a normal state?
>
> Hopefully that was all clear.  Thanks in advance for any info!
>


Re: Rebuild failing while adding new datacenter

2016-10-20 Thread Yabin Meng
I have seen this on other releases, on 2.2.x. The workaround is exactly
like yours,  some other system keyspaces also need similar changes.

I would say this is a benign bug.

Yabin

On Thu, Oct 20, 2016 at 4:41 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> thanks,
>
> This always works on 2.1.13 and 2.1.16 version but not on 3.0.8.
> definitely not a firewall issue
>
> On Thu, Oct 20, 2016 at 1:16 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> we faced a similar issue earlier, but that was more related to firewall
>> rules. The newly added datacenter was not able to communicate with the
>> existing datacenters on the port 7000(inter-node communication). Your's
>> might be a different issue, but just saying.
>>
>>
>> On Thu, Oct 20, 2016 at 4:12 PM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have single datacenter with 3 C* nodes and we are trying to expand the
>>> cluster to another region/DC. I am seeing the below error while doing a 
>>> "nodetool
>>> rebuild -- name_of_existing_data_center" .
>>>
>>> [user@machine ~]$ nodetool rebuild DC1
>>> nodetool: Unable to find sufficient sources for streaming range
>>> (-402178150752044282,-396707578307430827] in keyspace system_distributed
>>> See 'nodetool help' or 'nodetool help '.
>>> [user@machine ~]$
>>>
>>> user@cqlsh> SELECT * from system_schema.keyspaces where
>>> keyspace_name='system_distributed';
>>>
>>>  keyspace_name | durable_writes | replication
>>> ---++---
>>> --
>>>  system_distributed |   True | {'class':
>>> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor':
>>> '3'}
>>>
>>> (1 rows)
>>>
>>> To overcome this I have updated system_distributed keyspace to DC1:3 and
>>> DC2:3 with NetworkTopologyStrategy
>>>
>>> C* Version - 3.0.8
>>>
>>> Is this a bug that is introduced in 3.0.8 version of cassandra? as I
>>> haven't seen this issue with the older versions?
>>>
>>
>>
>


Re: wide rows

2016-10-18 Thread Yabin Meng
With CQL data modeling, everything is called a "row". But really in CQL, a
row is just a logical concept. So if you think of "wide partition" instead
of "wide row" (partition is what is determined by the has index of the
partition key), it will help the understanding a bit: one wide-partition
may contain multiple logical CQL rows - each CQL row just represents one
actual storage column of the partition.

Time-series data is usually a good fit for "wide-partition" data modeling,
but please remember that don't go too crazy with it.

Cheers,

Yabin

On Tue, Oct 18, 2016 at 11:23 AM, DuyHai Doan  wrote:

> // user table: skinny partition
> CREATE TABLE user (
> user_id uuid,
> firstname text,
> lastname text,
> 
> PRIMARY KEY ((user_id))
> );
>
> // sensor_data table: wide partition
> CREATE TABLE sensor_data (
>  sensor_id uuid,
>  date timestamp,
>  value double,
>  PRIMARY KEY ((sensor_id),  date)
> );
>
> On Tue, Oct 18, 2016 at 5:07 PM, S Ahmed  wrote:
>
>> Hi,
>>
>> Can someone clarify how you would model a "wide" row cassandra table?
>> From what I understand, a wide row table is where you keep appending
>> columns to a given row.
>>
>> The other way to model a table would be the "regular" style where each
>> row contains data so you would during a SELECT you would want multiple rows
>> as oppose to a wide row where you would get a single row, but a subset of
>> columns.
>>
>> Can someone show a simple data model that compares both styles?
>>
>> Thanks.
>>
>
>


Re: failure node rejoin

2016-10-17 Thread Yabin Meng
The exception you run into is expected behavior. This is because as Ben
pointed out, when you delete everything (including system schemas), C*
cluster thinks you're bootstrapping a new node. However,  node2's IP is
still in gossip and this is why you see the exception.

I'm not clear the reasoning why you need to delete C* data directory. That
is a dangerous action, especially considering that you delete system
schemas. If in any case the failure node is gone for a while, what you need
to do is to is remove the node first before doing "rejoin".

Cheers,

Yabin

On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater 
wrote:

> To cassandra, the node where you deleted the files looks like a brand new
> machine. It doesn’t automatically rebuild machines to prevent accidental
> replacement. You need to tell it to build the “new” machines as a
> replacement for the “old” machine with that IP by setting
> -Dcassandra.replace_address_first_boot=. See
> http://cassandra.apache.org/doc/latest/operating/topo_changes.html.
>
> Cheers
> Ben
>
> On Mon, 17 Oct 2016 at 16:41 Yuji Ito  wrote:
>
>> Hi all,
>>
>> A failure node can rejoin a cluster.
>> On the node, all data in /var/lib/cassandra were deleted.
>> Is it normal?
>>
>> I can reproduce it as below.
>>
>> cluster:
>> - C* 2.2.7
>> - a cluster has node1, 2, 3
>> - node1 is a seed
>> - replication_factor: 3
>>
>> how to:
>> 1) stop C* process and delete all data in /var/lib/cassandra on node2
>> ($sudo rm -rf /var/lib/cassandra/*)
>> 2) stop C* process on node1 and node3
>> 3) restart C* on node1
>> 4) restart C* on node2
>>
>> nodetool status after 4):
>> Datacenter: datacenter1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  AddressLoad   Tokens   Owns (effective)  Host ID
>>   Rack
>> DN  [node3 IP]  ? 256  100.0%
>>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
>> UN  [node2 IP]  7.76 MB  256  100.0%
>>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
>> UN  [node1 IP]  416.13 MB  256  100.0%
>>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>>
>> If I restart C* on node 2 when C* on node1 and node3 are running (without
>> 2), 3)), a runtime exception happens.
>> RuntimeException: "A node with address [node2 IP] already exists,
>> cancelling join..."
>>
>> I'm not sure this causes data lost. All data can be read properly just
>> after this rejoin.
>> But some rows are lost when I kill C* for destructive tests after
>> this rejoin.
>>
>> Thanks.
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


Re: Adding disk capacity to a running node

2016-10-17 Thread Yabin Meng
I assume you're talking about Cassandra JBOD (just a bunch of disk) setup
because you do mention it as adding it to the list of data directories. If
this is the case, you may run into issues, depending on your C* version.
Check this out: http://www.datastax.com/dev/blog/improving-jbod.

Or another approach is to use LVM to manage multiple devices into a single
mount point. If you do so, from what Cassandra can see is just simply
increased disk storage space and there should should have no problem.

Hope this helps,

Yabin

On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin 
wrote:

> Yes, Cassandra should keep percent of disk usage equal for all disk.
> Compaction process and SSTable flushes will use new disk to distribute both
> new and existing data.
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards  >* wrote 
>
> We have a few nodes that are running out of disk capacity at the moment
> and instead of adding more nodes to the cluster, we would like to add
> another disk to the server and add it to the list of data directories. My
> question, is, will Cassandra use the new disk for compactions on sstables
> that already exist in the primary directory?
>
>
>
> Thanks!
>
>
>


Re: Replacing a dead node in a live Cassandra Cluster

2016-10-03 Thread Yabin Meng
Are you sure cassandra.yaml file of the new node is correctly configured?
What is your seeds and listen_address setup of your new node and existing
nodes?

Yabin

On Fri, Sep 30, 2016 at 7:56 PM, Rajath Subramanyam 
wrote:

> Hello Cassandra-users,
>
> I was running some tests today. My end goal was to learn more about
> replacing a dead node in a live Cassandra cluster with minimal disruption
> to the existing cluster and figure out a better and faster way of doing the
> same.
>
> I am running a package installation of the following version of Cassandra.
>
> [centos@rj-cassandra-1 testcf-97896450869d11e6a84c4381bf5c5035]$ nodetool
> version
> ReleaseVersion: 2.1.12
>
> I setup a 4 node Cassandra in the lab. I got one non-seed node (lets say
> node1) down by issuing a 'sudo service cassandra stop'. Then following
> following instructions from this link
> ,
> I tried to replace node1 with the JMX option 
> -Dcassandra.replace_address=.
> However, when I do this the bootstrap fails with the following error in the
> log:
>
> ERROR [main] 2016-09-30 23:54:17,104 CassandraDaemon.java:579 - Exception
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:512)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:721)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> [apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> [apache-cassandra-2.1.12.jar:2.1.12]
> at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> [apache-cassandra-2.1.12.jar:2.1.12]
> WARN  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
> Gossiper.java:1454 - No local state or state is in silent shutdown, not
> announcing shutdown
> INFO  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
> MessagingService.java:734 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/10.7.0.232] 2016-09-30 23:54:17,110
> MessagingService.java:1018 - MessagingService has terminated the accept()
> thread
>
> How do I recover from this error message ?
>
> 
> Rajath Subramanyam
>
>


Re: Cassandra 3 node cluster with intermittent network issues on one node

2016-10-03 Thread Yabin Meng
Most likely node A has some gossip related problems. You can try purging
the gossip state on node A, as per the procedure:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
.

Yabin

On Mon, Oct 3, 2016 at 2:38 AM, Girish Kamarthi <
girish.kamar...@stellapps.com> wrote:

> Hi All,
>
> I want to test out a scenario where there is intermittent network issues
> on one of the node.
>
> I've got Cassandra 3.7 cluster of 3 nodes with the keyspace replication
> factor of 3.
>
> All the 3 nodes(node A, node B, node C) are started and are in sync. When
> one of the cassandra node went down (node A), I restarted cassandra, the
> node A gets in sync with the other nodes B & C.
>
> Now my question is when one of the node has issues like intermittent
> network issues (cassandra is still up and running). Say node A is having
> network issues, the nodetool status on the other 2 nodes b & C shows that
> the node A is down.
>
> *Debug.log of Node B & C:*
>
> DEBUG [GossipTasks:1] 2016-10-03 11:46:18,922 Gossiper.java:337 -
> Convicting /10.1.1.4 with status NORMAL - alive false
>
> When the network is back on the node A the nodetool status shows that the
> other nodes are down.
>
> *Debug.log of Node A:*
>
> DEBUG [GossipTasks:1] 2016-10-03 11:47:23,613 Gossiper.java:337 -
> Convicting /10.1.1.5 with status NORMAL - alive false
>
> DEBUG [GossipTasks:1] 2016-10-03 11:47:23,614 Gossiper.java:337 -
> Convicting /10.1.1.6 with status NORMAL - alive false
>
>
> Below are the configuration changes I made in the cassandra.yaml files.
>
> Node 01
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.4
>
> broadcast_address: 10.1.1.4
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.4
>
>
> Node02
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.5
>
> broadcast_address: 10.1.1.5
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.5
>
>
> Node03
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.6
>
> broadcast_address: 10.1.1.6
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.6
>
>
> Nodetool status on node A when the network is up shows that the other
> nodes are down (DN).
>
> Nodetool status on the other nodes B & C shows that the node 1 is down (DN)
>
> How does the handshaking works in this scenario?
>
> Why the node A is not in sync with the other nodes when the network is up?
>
> Please give me some inputs on resolving this issue.
>
> Thanks & Regards,
> Girish Kumar Kamarthi
> +91-9986427891
>


Re: cassandra dump file path

2016-10-03 Thread Yabin Meng
Have you restarted Cassandra after making changes in cassandra-env.sh?

Yabin

On Mon, Oct 3, 2016 at 7:44 AM, Jean Carlo 
wrote:

> OK I got the response to one of my questions. In the script
> /etc/init.d/cassandra we set the path for the heap dump by default in the
> cassandra_home.
>
> Now the thing I don't understand is, why do the dumps are located by the
> file set by /etc/init.d/cassandra and not by the  conf file
> cassandra-env.sh?
>
> Anyone any idea?
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Mon, Oct 3, 2016 at 12:00 PM, Jean Carlo 
> wrote:
>
>>
>> Hi
>>
>> I see in the log of my node cassandra that the parameter -XX:HeapDumpPath
>> is charged two times.
>>
>> INFO  [main] 2016-10-03 04:21:29,941 CassandraDaemon.java:205 - JVM
>> Arguments: [-ea, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar,
>> -XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities,
>> -XX:ThreadPriorityPolicy=42, -Xms6G, -Xmx6G, -Xmn600M, 
>> *-XX:+HeapDumpOnOutOfMemoryError,
>> -XX:HeapDumpPath=/cassandra/dumps/cassandra-1475461287-pid34435.hprof*,
>> -Xss256k, -XX:StringTableSize=103, -XX:+UseParNewGC,
>> -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled,
>> -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1,
>> -XX:CMSInitiatingOccupancyFraction=30, -XX:+UseCMSInitiatingOccupancyOnly,
>> -XX:+UseTLAB, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler,
>> -XX:CMSWaitDuration=1, -XX:+CMSParallelInitialMarkEnabled,
>> -XX:+CMSEdenChunksRecordAlways, -XX:CMSWaitDuration=1,
>> -XX:+UseCondCardMark, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps,
>> -XX:+PrintGCApplicationStoppedTime, 
>> -Xloggc:/var/opt/hosting/log/cassandra/gc.log,
>> -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=20,
>> -XX:GCLogFileSize=20M, -Djava.net.preferIPv4Stack=true,
>> -Dcom.sun.management.jmxremote.port=7199, 
>> -Dcom.sun.management.jmxremote.rmi.port=7199,
>> -Dcom.sun.management.jmxremote.ssl=false, 
>> -Dcom.sun.management.jmxremote.authenticate=false,
>> -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password,
>> -Djava.io.tmpdir=/var/opt/hosting/db/cassandra/tmp,
>> -javaagent:/usr/share/cassandra/lib/jolokia-jvm-1.0.6-agent.jar=port=8778,host=0.0.0.0,
>> -Dcassandra.auth_bcrypt_gensalt_log2_rounds=4,
>> -Dlogback.configurationFile=logback.xml, 
>> -Dcassandra.logdir=/var/log/cassandra,
>> -Dcassandra.storagedir=, 
>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid,
>> *-XX:HeapDumpPath=/var/lib/cassandra/java_1475461286.hprof*,
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1475461286.log]
>>
>> This option is defined in cassandra-env.sh
>>
>> if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
>> JVM_OPTS="$JVM_OPTS 
>> -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date
>> +%s`-pid$$.hprof"
>> fi
>>  and we defined before the value of CASSANDRA_HEAPDUMP_DIR before to
>>
>>
>> */cassandra/dumps/*
>> It is seems that cassandra does not care about the conf in
>> cassandra-env.sh and he only takes in account the last set for HeapDumpPath
>>
>> */var/lib/cassandra/java_1475461286.hprof*
>> This causes problems when we have to dump the heap because cassandra uses
>> the disk not suitable to do it.
>>
>> Is  *XX:HeapDumpPath *set in another place/file that I dont know?
>>
>> Thxs
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>
>


Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Yabin Meng
Dorian, I don't think Cassandra is able to achieve what you want natively.
In short words, what you want to achieve is conditional data replication.

Yabin



On Mon, Oct 3, 2016 at 1:37 PM, Dorian Hoxha  wrote:

> @INDRANIL
> Please go find your own thread and don't hijack mine.
>
> On Mon, Oct 3, 2016 at 6:19 PM, INDRANIL BASU 
> wrote:
>
>> Hello All,
>>
>> I am getting the below error repeatedly in the system log of C* 2.1.0
>>
>> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
>> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
>> test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold).
>> 5000 columns was requested, slices=[-], 
>> delInfo={deletedAt=-9223372036854775808,
>> localDeletion=2147483647}
>>
>> After that NullPointer Exception and finally OOM
>>
>> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:6287,1,main]
>> java.lang.NullPointerException: null
>> at org.apache.cassandra.service.CacheService$KeyCacheSerializer
>> .serialize(CacheService.java:475) ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at org.apache.cassandra.service.CacheService$KeyCacheSerializer
>> .serialize(CacheService.java:463) ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at 
>> org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
>> ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at 
>> org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
>> ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.FutureTask.run(Unknown Source)
>> ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source) [na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source) [na:1.7.0_80]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
>> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:9712,1,main]
>> java.lang.NullPointerException: null
>> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:10070,1,main]
>> java.lang.NullPointerException: null
>> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:10413,1,main]
>> java.lang.NullPointerException: null
>> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[MemtableFlushWriter:2396,5,main]
>> java.lang.OutOfMemoryError: unable to create new native thread
>> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
>> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>>
>> -- IB
>>
>>
>>
>


Re: Nodetool rebuild exception on c*-2.0.17

2016-09-23 Thread Yabin Meng
Hi,

>From "nodetool status" output, it looks like the cluster is running ok. The
exception itself simply says that data streaming fails during nodetool
rebuild. This could be due to possible network hiccup. It is hard to say.

You need to do further investigation. For example, you can run "nodetool
netstats" and check log file on the target node to get more information
about the failed streaming sessions, such as the source nodes of the failed
streaming sessions and then check the log files on those nodes.

Yabin

On Thu, Sep 22, 2016 at 4:57 PM, laxmikanth sadula 
wrote:

> Hi,
>
> We have c* 2.0.17 cluster with 2 DCs - DC1, DC2.  We tried to add new data
> center DC3 and ran "nodetool rebuild 'DC1'" and we faced below exception on
> few nodes after some data got streamed to new nodes in new data center DC3.
>
>
> *Exception in thread "main" java.lang.RuntimeException: Error while
> rebuilding node: Stream failed*
> *at
> org.apache.cassandra.service.StorageService.rebuild(StorageService.java:936)*
> *at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> *at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)*
> *at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> *at java.lang.reflect.Method.invoke(Method.java:606)*
> *at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)*
> *at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)*
> *at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> *at java.lang.reflect.Method.invoke(Method.java:606)*
> *at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)*
> *at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)*
> *at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)*
> *at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)*
> *at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)*
> *at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)*
> *at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)*
> *at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)*
> *at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)*
> *at
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)*
> *at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)*
> *at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)*
> *at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)*
> *at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)*
> *at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> *at java.lang.reflect.Method.invoke(Method.java:606)*
> *at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)*
> *at sun.rmi.transport.Transport$2.run(Transport.java:202)*
> *at sun.rmi.transport.Transport$2.run(Transport.java:199)*
> *at java.security.AccessController.doPrivileged(Native Method)*
> *at sun.rmi.transport.Transport.serviceCall(Transport.java:198)*
> *at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:567)*
> *at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)*
> *at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619)*
> *at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684)*
> *at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681)*
> *at java.security.AccessController.doPrivileged(Native Method)*
> *at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
> *at java.lang.Thread.run(Thread.java:745)*
>
>
> We  have 4  user keyspaces , so we altered all keyspaces as below before
> issuing rebuild.
>
> *ALTER KEYSPACE keyspace_name WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3' , 'DC2' : '3' , 'DC3' : '3'};*
>
>
> *Output of describecluster*
>
> ./nodetool describecluster
> Cluster Information:
> Name: Ss Cluster
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 3b688e54-47be-39e8-ae45-e71ba98d69e2: [xxx.xxx.198.75, xxx.xxx.198.132,
> xxx.xxx.198.133, xxx.xxx.12.115, xxx.xxx.198.78, xxx.xxx.12.123,
> xxx.xxx.98.205, xxx.xxx.98.219, xxx.xxx.98.220, xxx.xxx.198.167,
> xxx.xxx.98.172, xxx.xxx.98.173, xxx.xxx.98.170, xxx.xxx.198.168,
> xxx.xxx.98.171, xxx.xxx.198.169, xxx.xxx.12.146, 

Re: Rebuild failing when adding new datacenter (3.0.8)

2016-09-22 Thread Yabin Meng
It is a Cassandra bug. The workaround is to change system_distributed
keyspce replication strategy to something as below:

  alter keyspace  system_distributed with replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3', 'DC3': '3'};

You may see similar problem for other system keyspaces. Do the same thing.

Cheers,

Yabin

On Thu, Sep 22, 2016 at 1:44 PM, Timo Ahokas  wrote:

> Hi Alain,
>
> Our normal user keyspaces have RF3 in all DCs, e.g:
>
> create keyspace reporting with replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3', 'DC3': '3'};
>
> Any idea would it be safe to change the system_distributed keyspace to
> match this?
>
> -Timo
>
> On 22 September 2016 at 19:23, Timo Ahokas  wrote:
>
>> Hi Alain,
>>
>> Thanks a lot for a helping out!
>>
>> Some of the basic keyspace / cluster info you requested:
>>
>> # echo "DESCRIBE KEYSPACE system_distributed;" | cqlsh
>>
>> CREATE KEYSPACE system_distributed WITH replication = {'class':
>> 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
>>
>> CREATE TABLE system_distributed.repair_history (
>>
>>keyspace_name text,
>>
>>columnfamily_name text,
>>
>>id timeuuid,
>>
>>coordinator inet,
>>
>>exception_message text,
>>
>>exception_stacktrace text,
>>
>>finished_at timestamp,
>>
>>parent_id timeuuid,
>>
>>participants set,
>>
>>range_begin text,
>>
>>range_end text,
>>
>>started_at timestamp,
>>
>>status text,
>>
>>PRIMARY KEY ((keyspace_name, columnfamily_name), id)
>>
>> ) WITH CLUSTERING ORDER BY (id ASC)
>>
>>AND bloom_filter_fp_chance = 0.01
>>
>>AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>>AND comment = 'Repair history'
>>
>>AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': '4'}
>>
>>AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>>AND crc_check_chance = 1.0
>>
>>AND dclocal_read_repair_chance = 0.0
>>
>>AND default_time_to_live = 0
>>
>>AND gc_grace_seconds = 0
>>
>>AND max_index_interval = 2048
>>
>>AND memtable_flush_period_in_ms = 360
>>
>>AND min_index_interval = 128
>>
>>AND read_repair_chance = 0.0
>>
>>AND speculative_retry = '99PERCENTILE';
>>
>> CREATE TABLE system_distributed.parent_repair_history (
>>
>>parent_id timeuuid PRIMARY KEY,
>>
>>columnfamily_names set,
>>
>>exception_message text,
>>
>>exception_stacktrace text,
>>
>>finished_at timestamp,
>>
>>keyspace_name text,
>>
>>requested_ranges set,
>>
>>started_at timestamp,
>>
>>successful_ranges set
>>
>> ) WITH bloom_filter_fp_chance = 0.01
>>
>>AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>>AND comment = 'Repair history'
>>
>>AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': '4'}
>>
>>AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>>AND crc_check_chance = 1.0
>>
>>AND dclocal_read_repair_chance = 0.0
>>
>>AND default_time_to_live = 0
>>
>>AND gc_grace_seconds = 0
>>
>>AND max_index_interval = 2048
>>
>>AND memtable_flush_period_in_ms = 360
>>
>>AND min_index_interval = 128
>>
>>AND read_repair_chance = 0.0
>>
>>AND speculative_retry = '99PERCENTILE';
>>
>>
>> CREATE TABLE system_distributed.repair_history (
>>
>>keyspace_name text,
>>
>>columnfamily_name text,
>>
>>id timeuuid,
>>
>>coordinator inet,
>>
>>exception_message text,
>>
>>exception_stacktrace text,
>>
>>finished_at timestamp,
>>
>>parent_id timeuuid,
>>
>>participants set,
>>
>>range_begin text,
>>
>>range_end text,
>>
>>started_at timestamp,
>>
>>status text,
>>
>>PRIMARY KEY ((keyspace_name, columnfamily_name), id)
>>
>> ) WITH CLUSTERING ORDER BY (id ASC)
>>
>>AND bloom_filter_fp_chance = 0.01
>>
>>AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>>AND comment = 'Repair history'
>>
>>AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': '4'}
>>
>>AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>>AND crc_check_chance = 1.0
>>
>>AND dclocal_read_repair_chance = 0.0
>>
>>AND default_time_to_live = 0
>>
>>AND gc_grace_seconds = 0
>>
>>AND max_index_interval = 2048
>>
>>AND memtable_flush_period_in_ms = 360
>>
>>AND min_index_interval = 128
>>
>>AND read_repair_chance = 0.0
>>
>>AND speculative_retry = '99PERCENTILE';
>>
>> CREATE TABLE system_distributed.parent_repair_history (
>>
>>