Re: OOM under high write throughputs on 2.2.5

2016-05-24 Thread Bryan Cheng
Hi Zhiyan,

Silly question but are you sure your heap settings are actually being
applied?  "697,236,904 (51.91%)" would represent a sub-2GB heap. What's the
real memory usage for Java when this crash happens?

Other thing to look into might be memtable_heap_space_in_mb, as it looks
like you're using onheap memtables. This will be 1/4 of your heap by
default. Assuming your heap settings are actually being applied, if you run
through this space you may not have enough flushing resources.
memtable_flush_Writers defaults to a somewhat low number which may not be
enough for this use case.

On Fri, May 20, 2016 at 10:02 PM, Zhiyan Shao  wrote:

> Hi, we see the following OOM crash while doing heavy write loading
> testing. Has anybody seen this kind of crash? We are using G1GC with 32GB
> heap size out of 128GB system memory. Eclipse Memory Analyzer shows the
> following:
>
> One instance of *"org.apache.cassandra.db.ColumnFamilyStore"* loaded by 
> *"sun.misc.Launcher$AppClassLoader
> @ 0x8d800898"* occupies *697,236,904 (51.91%)* bytes. The memory is
> accumulated in one instance of
> *"java.util.concurrent.ConcurrentSkipListMap$HeadIndex"* loaded by *" class loader>"*.
>
> *Keywords*
>
> java.util.concurrent.ConcurrentSkipListMap$HeadIndex
>
> sun.misc.Launcher$AppClassLoader @ 0x8d800898
>
> org.apache.cassandra.db.ColumnFamilyStore
>
> Cassandra log:
>
>
> ERROR 00:23:24 JVM state determined to be unstable.  Exiting forcefully
> due to:
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_74]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_74]
> at
> org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:
> 137) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at
> org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:
> 97) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at
> org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:
> 57) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.utils.memory.ContextAllocator.clone
> (ContextAllocator.java:47) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.utils.memory.MemtableBufferAllocator.clone
> (MemtableBufferAllocator.java:61) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.db.Memtable.put(Memtable.java:212)
> ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:
> 1249) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:406)
> ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:366)
> ~[apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
> ~[apache-cassandra-2.2.5.jar:2.2.5]
> at
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:
> 50) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:
> 67) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_74]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:
> 164) ~[apache-cassandra-2.2.5.jar:2.2.5]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:
> 136) [apache-cassandra-2.2.5.jar:2.2.5]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-2.2.5.jar:2.2.5]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
>
> Thanks,
> Zhiyan
>


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bryan Cheng
Hi Luke,

I've never found nodetool status' load to be useful beyond a general
indicator.

You should expect some small skew, as this will depend on your current
compaction status, tombstones, etc. IIRC repair will not provide
consistency of intermediate states nor will it remove tombstones, it only
guarantees consistency in the final state. This means, in the case of
dropped hints or mutations, you will see differences in intermediate
states, and therefore storage footrpint, even in fully repaired nodes. This
includes intermediate UPDATE operations as well.

Your one node with sub 1GB sticks out like a sore thumb, though. Where did
you originate the nodetool repair from? Remember that repair will only
ensure consistency for ranges held by the node you're running it on. While
I am not sure if missing ranges are included in this, if you ran nodetool
repair only on a machine with partial ownership, you will need to complete
repairs across the ring before data will return to full consistency.

I would query some older data using consistency = ONE on the affected
machine to determine if you are actually missing data.  There are a few
outstanding bugs in the 2.1.x  and older release families that may result
in tombstone creation even without deletes, for example CASSANDRA-10547,
which impacts updates on collections in pre-2.1.13 Cassandra.

You can also try examining the output of nodetool ring, which will give you
a breakdown of tokens and their associations within your cluster.

--Bryan

On Tue, May 24, 2016 at 3:49 PM, kurt Greaves  wrote:

> Not necessarily considering RF is 2 so both nodes should have all
> partitions. Luke, are you sure the repair is succeeding? You don't have
> other keyspaces/duplicate data/extra data in your cassandra data directory?
> Also, you could try querying on the node with less data to confirm if it
> has the same dataset.
>
> On 24 May 2016 at 22:03, Bhuvan Rawal  wrote:
>
>> For the other DC, it can be acceptable because partition reside on one
>> node, so say  if you have a large partition, it may skew things a bit.
>> On May 25, 2016 2:41 AM, "Luke Jolly"  wrote:
>>
>>> So I guess the problem may have been with the initial addition of the
>>> 10.128.0.20 node because when I added it in it never synced data I
>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>> that has been written since it came up.  We never delete data ever so we
>>> should have zero tombstones.
>>>
>>> If I am not mistaken, only two of my nodes actually have all the data,
>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>> didn't fix either one.
>>>
>>> Am I running into a bug of some kind?
>>>
>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal 
>>> wrote:
>>>
 Hi Luke,

 You mentioned that replication factor was increased from 1 to 2. In
 that case was the node bearing ip 10.128.0.20 carried around 3GB data
 earlier?

 You can run nodetool repair with option -local to initiate repair local
 datacenter for gce-us-central1.

 Also you may suspect that if a lot of data was deleted while the node
 was down it may be having a lot of tombstones which is not needed to be
 replicated to the other node. In order to verify the same, you can issue a
 select count(*) query on column families (With the amount of data you have
 it should not be an issue) with tracing on and with consistency local_all
 by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
 file. It will give you a fair amount of idea about how many deleted cells
 the nodes have. I tried searching for reference if tombstones are moved
 around during repair, but I didnt find evidence of it. However I see no
 reason to because if the node didnt have data then streaming tombstones
 does not make a lot of sense.

 Regards,
 Bhuvan

 On Tue, May 24, 2016 at 11:06 PM, Luke Jolly 
 wrote:

> Here's my setup:
>
> Datacenter: gce-us-central1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  10.128.0.3   6.4 GB 256  100.0%
>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
> UN  10.128.0.20  943.08 MB  256  100.0%
>  958348cb-8205-4630-8b96-0951bf33f3d3  default
> Datacenter: gce-us-east1
> 
> Status=Up/Down
> |/ 

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread kurt Greaves
Not necessarily considering RF is 2 so both nodes should have all
partitions. Luke, are you sure the repair is succeeding? You don't have
other keyspaces/duplicate data/extra data in your cassandra data directory?
Also, you could try querying on the node with less data to confirm if it
has the same dataset.

On 24 May 2016 at 22:03, Bhuvan Rawal  wrote:

> For the other DC, it can be acceptable because partition reside on one
> node, so say  if you have a large partition, it may skew things a bit.
> On May 25, 2016 2:41 AM, "Luke Jolly"  wrote:
>
>> So I guess the problem may have been with the initial addition of the
>> 10.128.0.20 node because when I added it in it never synced data I
>> guess?  It was at around 50 MB when it first came up and transitioned to
>> "UN". After it was in I did the 1->2 replication change and tried repair
>> but it didn't fix it.  From what I can tell all the data on it is stuff
>> that has been written since it came up.  We never delete data ever so we
>> should have zero tombstones.
>>
>> If I am not mistaken, only two of my nodes actually have all the data,
>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>> is almost a GB lower and then of course 10.128.0.20 which is missing
>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>> didn't fix either one.
>>
>> Am I running into a bug of some kind?
>>
>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal  wrote:
>>
>>> Hi Luke,
>>>
>>> You mentioned that replication factor was increased from 1 to 2. In that
>>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>>
>>> You can run nodetool repair with option -local to initiate repair local
>>> datacenter for gce-us-central1.
>>>
>>> Also you may suspect that if a lot of data was deleted while the node
>>> was down it may be having a lot of tombstones which is not needed to be
>>> replicated to the other node. In order to verify the same, you can issue a
>>> select count(*) query on column families (With the amount of data you have
>>> it should not be an issue) with tracing on and with consistency local_all
>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>> file. It will give you a fair amount of idea about how many deleted cells
>>> the nodes have. I tried searching for reference if tombstones are moved
>>> around during repair, but I didnt find evidence of it. However I see no
>>> reason to because if the node didnt have data then streaming tombstones
>>> does not make a lot of sense.
>>>
>>> Regards,
>>> Bhuvan
>>>
>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly 
>>> wrote:
>>>
 Here's my setup:

 Datacenter: gce-us-central1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID
   Rack
 UN  10.128.0.3   6.4 GB 256  100.0%
  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
 UN  10.128.0.20  943.08 MB  256  100.0%
  958348cb-8205-4630-8b96-0951bf33f3d3  default
 Datacenter: gce-us-east1
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID
   Rack
 UN  10.142.0.14  6.4 GB 256  100.0%
  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
 UN  10.142.0.13  5.55 GB256  100.0%
  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default

 And my replication settings are:

 {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
 'gce-us-central1': '2', 'gce-us-east1': '2'}

 As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
 of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
 Also 10.142.0.13 seems also not to have everything as it only has a
 load of 5.55 GB.

 On Mon, May 23, 2016 at 7:28 PM, kurt Greaves 
 wrote:

> Do you have 1 node in each DC or 2? If you're saying you have 1 node
> in each DC then a RF of 2 doesn't make sense. Can you clarify on what your
> set up is?
>
> On 23 May 2016 at 19:31, Luke Jolly  wrote:
>
>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>> gce-us-east1.  I increased the replication factor of gce-us-central1 
>> from 1
>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>> for the node switched to 100% as it should but the Load showed that it
>> didn't actually sync the data.  I then ran a full 'nodetool repair' and 
>> it
>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>> way to assure consistency and that all the nodes were synced but it 
>> doesn't

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
For the other DC, it can be acceptable because partition reside on one
node, so say  if you have a large partition, it may skew things a bit.
On May 25, 2016 2:41 AM, "Luke Jolly"  wrote:

> So I guess the problem may have been with the initial addition of the
> 10.128.0.20 node because when I added it in it never synced data I
> guess?  It was at around 50 MB when it first came up and transitioned to
> "UN". After it was in I did the 1->2 replication change and tried repair
> but it didn't fix it.  From what I can tell all the data on it is stuff
> that has been written since it came up.  We never delete data ever so we
> should have zero tombstones.
>
> If I am not mistaken, only two of my nodes actually have all the data,
> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
> is almost a GB lower and then of course 10.128.0.20 which is missing over
> 5 GB of data.  I tried running nodetool -local on both DCs and it didn't
> fix either one.
>
> Am I running into a bug of some kind?
>
> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal  wrote:
>
>> Hi Luke,
>>
>> You mentioned that replication factor was increased from 1 to 2. In that
>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>
>> You can run nodetool repair with option -local to initiate repair local
>> datacenter for gce-us-central1.
>>
>> Also you may suspect that if a lot of data was deleted while the node was
>> down it may be having a lot of tombstones which is not needed to be
>> replicated to the other node. In order to verify the same, you can issue a
>> select count(*) query on column families (With the amount of data you have
>> it should not be an issue) with tracing on and with consistency local_all
>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>> file. It will give you a fair amount of idea about how many deleted cells
>> the nodes have. I tried searching for reference if tombstones are moved
>> around during repair, but I didnt find evidence of it. However I see no
>> reason to because if the node didnt have data then streaming tombstones
>> does not make a lot of sense.
>>
>> Regards,
>> Bhuvan
>>
>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly  wrote:
>>
>>> Here's my setup:
>>>
>>> Datacenter: gce-us-central1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>> Rack
>>> UN  10.128.0.3   6.4 GB 256  100.0%
>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>> UN  10.128.0.20  943.08 MB  256  100.0%
>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>> Datacenter: gce-us-east1
>>> 
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>> Rack
>>> UN  10.142.0.14  6.4 GB 256  100.0%
>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>> UN  10.142.0.13  5.55 GB256  100.0%
>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>
>>> And my replication settings are:
>>>
>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>
>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
>>> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 
>>> 10.142.0.13
>>> seems also not to have everything as it only has a load of 5.55 GB.
>>>
>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves 
>>> wrote:
>>>
 Do you have 1 node in each DC or 2? If you're saying you have 1 node in
 each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
 up is?

 On 23 May 2016 at 19:31, Luke Jolly  wrote:

> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
> gce-us-east1.  I increased the replication factor of gce-us-central1 from 
> 1
> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
> for the node switched to 100% as it should but the Load showed that it
> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
> way to assure consistency and that all the nodes were synced but it 
> doesn't
> seem to be.  Outside of that command, I have no idea how I would assure 
> all
> the data was synced or how to get the data correctly synced without
> decommissioning the node and re-adding it.
>



 --
 Kurt Greaves
 k...@instaclustr.com
 www.instaclustr.com

>>>
>>>
>>


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Luke Jolly
So I guess the problem may have been with the initial addition of the
10.128.0.20 node because when I added it in it never synced data I guess?
It was at around 50 MB when it first came up and transitioned to "UN".
After it was in I did the 1->2 replication change and tried repair but it
didn't fix it.  From what I can tell all the data on it is stuff that has
been written since it came up.  We never delete data ever so we should have
zero tombstones.

If I am not mistaken, only two of my nodes actually have all the data,
10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
is almost a GB lower and then of course 10.128.0.20 which is missing over 5
GB of data.  I tried running nodetool -local on both DCs and it didn't fix
either one.

Am I running into a bug of some kind?

On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal  wrote:

> Hi Luke,
>
> You mentioned that replication factor was increased from 1 to 2. In that
> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>
> You can run nodetool repair with option -local to initiate repair local
> datacenter for gce-us-central1.
>
> Also you may suspect that if a lot of data was deleted while the node was
> down it may be having a lot of tombstones which is not needed to be
> replicated to the other node. In order to verify the same, you can issue a
> select count(*) query on column families (With the amount of data you have
> it should not be an issue) with tracing on and with consistency local_all
> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
> file. It will give you a fair amount of idea about how many deleted cells
> the nodes have. I tried searching for reference if tombstones are moved
> around during repair, but I didnt find evidence of it. However I see no
> reason to because if the node didnt have data then streaming tombstones
> does not make a lot of sense.
>
> Regards,
> Bhuvan
>
> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly  wrote:
>
>> Here's my setup:
>>
>> Datacenter: gce-us-central1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address  Load   Tokens   Owns (effective)  Host ID
>> Rack
>> UN  10.128.0.3   6.4 GB 256  100.0%
>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>> UN  10.128.0.20  943.08 MB  256  100.0%
>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>> Datacenter: gce-us-east1
>> 
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address  Load   Tokens   Owns (effective)  Host ID
>> Rack
>> UN  10.142.0.14  6.4 GB 256  100.0%
>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>> UN  10.142.0.13  5.55 GB256  100.0%
>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>
>> And my replication settings are:
>>
>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>
>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
>> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 
>> 10.142.0.13
>> seems also not to have everything as it only has a load of 5.55 GB.
>>
>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves 
>> wrote:
>>
>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>>> up is?
>>>
>>> On 23 May 2016 at 19:31, Luke Jolly  wrote:
>>>
 I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
 gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
 to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
 for the node switched to 100% as it should but the Load showed that it
 didn't actually sync the data.  I then ran a full 'nodetool repair' and it
 didn't fix it still.  This scares me as I thought 'nodetool repair' was a
 way to assure consistency and that all the nodes were synced but it doesn't
 seem to be.  Outside of that command, I have no idea how I would assure all
 the data was synced or how to get the data correctly synced without
 decommissioning the node and re-adding it.

>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>>
>


Re: Cassandra and Kubernetes and scaling

2016-05-24 Thread Aiman Parvaiz
Looking forward to hearing from the community about this.

Sent from my iPhone

> On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz  wrote:
> 
> I saw a thread from April 2016 talking about Cassandra and Kubernetes, and 
> have a few follow up questions.  It seems that especially after v1.2 of 
> Kubernetes, and the upcoming 1.3 features, this would be a very viable option 
> of running Cassandra on.
> 
> My questions pertain to HostIds and Scaling Up/Down, and are related:
> 
> 1.  If a container's host dies and is then brought up on another host, can 
> you start up with the same PersistentVolume as the original container had?  
> Which begs the question would the new container get a new HostId, implying it 
> would need to bootstrap into the environment?   If it's a bootstrap, does the 
> old one get deco'd/assassinated?
> 
> 2. Scaling up/down.  Scaling up would be relatively easy, as it should just 
> kick off Bootstrapping the node into the cluster, but what if you need to 
> scale down?  Would the Container get deco'd by the scaling down process? or 
> just terminated, leaving you with potential missing replicas
> 
> 3. Scaling up and increasing the RF of a particular keyspace, would there be 
> a clean way to do this with the kubernetes tooling? 
> 
> In the end I'm wondering how much of the Kubernetes + Cassandra involves 
> nodetool, and how much is just a Docker image where you need to manage that 
> all yourself (painfully)
> 
> -- 
> --mike


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
Hi Luke,

You mentioned that replication factor was increased from 1 to 2. In that
case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?

You can run nodetool repair with option -local to initiate repair local
datacenter for gce-us-central1.

Also you may suspect that if a lot of data was deleted while the node was
down it may be having a lot of tombstones which is not needed to be
replicated to the other node. In order to verify the same, you can issue a
select count(*) query on column families (With the amount of data you have
it should not be an issue) with tracing on and with consistency local_all
by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a file.
It will give you a fair amount of idea about how many deleted cells the
nodes have. I tried searching for reference if tombstones are moved around
during repair, but I didnt find evidence of it. However I see no reason to
because if the node didnt have data then streaming tombstones does not make
a lot of sense.

Regards,
Bhuvan

On Tue, May 24, 2016 at 11:06 PM, Luke Jolly  wrote:

> Here's my setup:
>
> Datacenter: gce-us-central1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  10.128.0.3   6.4 GB 256  100.0%
>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
> UN  10.128.0.20  943.08 MB  256  100.0%
>  958348cb-8205-4630-8b96-0951bf33f3d3  default
> Datacenter: gce-us-east1
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  10.142.0.14  6.4 GB 256  100.0%
>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
> UN  10.142.0.13  5.55 GB256  100.0%
>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>
> And my replication settings are:
>
> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>
> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 
> 10.142.0.13
> seems also not to have everything as it only has a load of 5.55 GB.
>
> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves 
> wrote:
>
>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>> up is?
>>
>> On 23 May 2016 at 19:31, Luke Jolly  wrote:
>>
>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>> for the node switched to 100% as it should but the Load showed that it
>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>> way to assure consistency and that all the nodes were synced but it doesn't
>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>> the data was synced or how to get the data correctly synced without
>>> decommissioning the node and re-adding it.
>>>
>>
>>
>>
>> --
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
>
>


Re: Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Mark Reddy
+1 to what Eric said, a queue is a classic C* anti-pattern. Something like
Kafka or RabbitMQ might fit your use case better.


Mark

On 24 May 2016 at 18:03, Eric Stevens  wrote:

> It sounds like you're trying to build a queue in Cassandra, which is one
> of the classic anti-pattern use cases for Cassandra.
>
> You may be able to do something clever with triggers, but I highly
> recommend you look at purpose-built queuing software such as Kafka to solve
> this instead.
>
> On Tue, May 24, 2016 at 9:49 AM Aaditya Vadnere  wrote:
>
>> Hi experts,
>>
>> We are evaluating Cassandra as messaging infrastructure for a project.
>>
>> In our workflow Cassandra database will be synchronized across two nodes,
>> a component will INSERT/UPDATE records on one node and another component
>> (who has registered for the specific table) on second node will get
>> notified of record change.
>>
>> The second component will then try to read the database to find out the
>> specific message.
>>
>> Is it possible for Cassandra to support such workflow? Basically, is
>> there a way for Cassandra to generate a notification anytime schema changes
>> (so we can set processes to listen for schema changes). As I understand,
>> polling the database periodically or database triggers might work but they
>> are costly operations.
>>
>>
>> --
>> Aaditya Vadnere
>>
>


Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
so i guess i have to 1) increase the heap size or 2) reduce the number of
keyspaces/column families.

Thanks for you confirmation.

On Tue, May 24, 2016 at 10:08 AM, Eric Stevens <migh...@gmail.com> wrote:

> Large numbers of tables is generally recommended against.  Each table has
> a fixed on-heap memory overhead, and by your description it sounds like you
> might have as many as 12,000 total tables when you start running into
> trouble.
>
> With such a small heap to begin with, you've probably used up most of the
> available heap just with managing the tables.  This is supported by the big
> STW pauses you're observing in Cassandra's logs.
>
> On Tue, May 24, 2016 at 11:04 AM Justin Lin <linjianfeng...@gmail.com>
> wrote:
>
>> We are exploring cassandra's limit by creating a lot of keyspaces with
>> moderate number of column families (roughly 40 - 50) per keyspace and we
>> have a problem after we reach certain amount of keyspaces, that cqlsh
>> starts to time out when connecting to cassandra.
>>
>> This is our cassandra setup. We have one cassandra running locally and we
>> assign 4GB memory to jvm, set the memtable_allocation_type to be onheap and
>> use default memtable_heap_space_in_mb, which i believe is 2GB. The
>> cassandra version is 2.1.9.
>>
>> So after we create more than 250 keyspaces, cqlsh starts to times out
>> when connecting to cassandra in most case. (Sometimes it still can connect
>> to cassandra). And from cassandra log, we can see it takes roughly 3
>> seconds to do gc when there is an incoming connection. And the gc is the
>> only difference between the timeout connection and the successful
>> connection. So we suspect this Stop-The-World GC might block the connection
>> until it times out. This is the log that i think is relevant.
>>
>> INFO 20160524-060930.028882 ::  Initializing
>> sandbox_20160524_t06_09_18.table1
>>
>> INFO 20160524-060933.908008 ::  G1 Young Generation GC in 551ms.  G1 Eden
>> Space: 98112 -> 0; G1 Old Gen: 2811821584 -> 3034119696;
>>
>> INFO 20160524-060933.908043 ::  G1 Old Generation GC in 2631ms.  G1 Old
>> Gen: 3034119696 -> 2290099032;
>>
>> We suspect the issue might relate to the reported bug as well:
>> https://issues.apache.org/jira/browse/CASSANDRA-9291
>> but not really sure about it.
>>
>> Sorry for the setup, so my question is
>> 1) is the connection timeout related to gc or the tomestone in
>> system.schame_keyspaces table?
>> 2) how can we fix this problem?
>> 3) I did some tests by dropping a bunch of keyspaces after timing out and
>> it seems to fix the problem as i never got another time out. Is this the
>> only way to fix it?
>>
>> Thanks a lot for your help.
>>
>> --
>> come on
>>
>


-- 
come on


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Luke Jolly
Here's my setup:

Datacenter: gce-us-central1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns (effective)  Host ID
  Rack
UN  10.128.0.3   6.4 GB 256  100.0%
 3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
UN  10.128.0.20  943.08 MB  256  100.0%
 958348cb-8205-4630-8b96-0951bf33f3d3  default
Datacenter: gce-us-east1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns (effective)  Host ID
  Rack
UN  10.142.0.14  6.4 GB 256  100.0%
 c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
UN  10.142.0.13  5.55 GB256  100.0%
 d0d9c30e-1506-4b95-be64-3dd4d78f0583  default

And my replication settings are:

{'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', 'gce-us-central1':
'2', 'gce-us-east1': '2'}

As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of 943
MB even though it's supposed to own 100% and should have 6.4 GB.  Also
10.142.0.13
seems also not to have everything as it only has a load of 5.55 GB.

On Mon, May 23, 2016 at 7:28 PM, kurt Greaves  wrote:

> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
> up is?
>
> On 23 May 2016 at 19:31, Luke Jolly  wrote:
>
>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns" for
>> the node switched to 100% as it should but the Load showed that it didn't
>> actually sync the data.  I then ran a full 'nodetool repair' and it didn't
>> fix it still.  This scares me as I thought 'nodetool repair' was a way to
>> assure consistency and that all the nodes were synced but it doesn't seem
>> to be.  Outside of that command, I have no idea how I would assure all the
>> data was synced or how to get the data correctly synced without
>> decommissioning the node and re-adding it.
>>
>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Cassandra and Kubernetes and scaling

2016-05-24 Thread Mike Wojcikiewicz
I saw a thread from April 2016 talking about Cassandra and Kubernetes, and
have a few follow up questions.  It seems that especially after v1.2 of
Kubernetes, and the upcoming 1.3 features, this would be a very viable
option of running Cassandra on.

My questions pertain to HostIds and Scaling Up/Down, and are related:

1.  If a container's host dies and is then brought up on another host, can
you start up with the same PersistentVolume as the original container had?
Which begs the question would the new container get a new HostId, implying
it would need to bootstrap into the environment?   If it's a bootstrap,
does the old one get deco'd/assassinated?

2. Scaling up/down.  Scaling up would be relatively easy, as it should just
kick off Bootstrapping the node into the cluster, but what if you need to
scale down?  Would the Container get deco'd by the scaling down process? or
just terminated, leaving you with potential missing replicas

3. Scaling up and increasing the RF of a particular keyspace, would there
be a clean way to do this with the kubernetes tooling?

In the end I'm wondering how much of the Kubernetes + Cassandra involves
nodetool, and how much is just a Docker image where you need to manage that
all yourself (painfully)

-- 
--mike


Re: Thrift client creates massive amounts of network packets

2016-05-24 Thread Eric Stevens
I'm not familiar with Titan's usage patterns for Cassandra, but I wonder if
this is because of the consistency level it's querying Cassandra at - i.e.
if CL isn't LOCAL_[something], then this might just be lots of little
checksums required to satisfy consistency requirements.

On Mon, May 23, 2016 at 7:22 AM Ralf Steppacher 
wrote:

> I remembered that Titan treats edges (and vertices?) as immutable and
> deletes the entity and re-creates it on every change.
> So I set the gc_grace_seconds to 0 for every table in the Titan keyspace
> and ran a major compaction. However, this made the situation worse. Instead
> of roughly 2’700 tcp packets per user request before the compaction, the
> same request now results in 5’400 packets. Which is suspiciously close to a
> factor or 2. But I have no idea wha to make of it.
>
> Ralf
>
>
> > On 20.05.2016, at 15:11, Ralf Steppacher 
> wrote:
> >
> > Hi all,
> >
> > tl:dr
> > The Titan 0.5.4 cassandrathrift client + C* 2.0.8/2.2.6 create massive
> amounts of network packets for multiget_slice queries. Is there a way to
> avoid the “packet storm”?
> >
> >
> > Details...
> >
> > We are using Titan 0.5.4 with its cassandrathrift storage engine to
> connect to a single node cluster running C* 2.2.6 (we also tried 2.0.8,
> which is the version in Titans dependencies). When moving to a
> multi-datacenter setup with the client in one DC and the C* server in the
> other, we ran into the problem that response times from Cassandra/the graph
> became unacceptable (>30s vs. 0.2s within datacenter). Looking at the
> network traffic we saw that the client and server exchange a massive number
> of very small packets.
> > The user action we were tracing yields three packets of type “REPLY
> multiget_slice”. Per such a reply we see about 1’000 of packet pairs like
> this going back and forth between client and server:
> >
> > 968   09:45:55.354613   x.x.x.30 x.x.x.98 TCP   181   54406 → 9160 [PSH,
> ACK] Seq=53709 Ack=39558 Win=1002 Len=115 TSval=4169130400 TSecr=4169119527
> >    00 50 56 a7 d6 0d 00 0c 29 d1 a4 5e 08 00 45 00  .PV.)..^..E.
> > 0010   00 a7 e3 6d 40 00 40 06 fe 3c ac 13 00 1e ac 13  ...m@.@..<..
> > 0020   00 62 d4 86 23 c8 2c 30 4e 45 1b 4b 0b 55 80 18  .b..#.,0NE.K.U..
> > 0030   03 ea 59 40 00 00 01 01 08 0a f8 7f e1 a0 f8 7f  ..Y@
> > 0040   b7 27 00 00 00 6f 80 01 00 01 00 00 00 0e 6d 75  .'...omu
> > 0050   6c 74 69 67 65 74 5f 73 6c 69 63 65 00 00 3a 38  ltiget_slice..:8
> > 0060   0f 00 01 0b 00 00 00 01 00 00 00 08 00 00 00 00  
> > 0070   00 00 ab 00 0c 00 02 0b 00 03 00 00 00 09 65 64  ..ed
> > 0080   67 65 73 74 6f 72 65 00 0c 00 03 0c 00 02 0b 00  gestore.
> > 0090   01 00 00 00 02 72 c0 0b 00 02 00 00 00 02 72 c1  .rr.
> > 00a0   02 00 03 00 08 00 04 7f ff ff ff 00 00 08 00 04  
> > 00b0   00 00 00 01 00   .
> >
> > 969   09:45:55.354825   x.x.x.98 x.x.x.30 TCP   123   9160 → 54406 [PSH,
> ACK] Seq=39558 Ack=53824 Win=1540 Len=57 TSval=4169119546 TSecr=4169130400
> >    00 0c 29 d1 a4 5e 00 50 56 a7 d6 0d 08 00 45 00  ..)..^.PV.E.
> > 0010   00 6d 19 dd 40 00 40 06 c8 07 ac 13 00 62 ac 13  .m..@.@..b..
> > 0020   00 1e 23 c8 d4 86 1b 4b 0b 55 2c 30 4e b8 80 18  ..#K.U,0N...
> > 0030   06 04 3b d6 00 00 01 01 08 0a f8 7f b7 3a f8 7f  ..;..:..
> > 0040   e1 a0 00 00 00 35 80 01 00 02 00 00 00 0e 6d 75  .5mu
> > 0050   6c 74 69 67 65 74 5f 73 6c 69 63 65 00 00 3a 38  ltiget_slice..:8
> > 0060   0d 00 00 0b 0f 00 00 00 01 00 00 00 08 00 00 00  
> > 0070   00 00 00 ab 00 0c 00 00 00 00 00 ………..
> >
> > With very few exceptions all packets have the exact same length of 181
> and 123 bytes respectively. The overall response time of the graph query
> grows approx. linearly with the network latency.
> > As even “normal” internet network latencies render the setup useless I
> assume we are doing something wrong. Is there a way to avoid that storm of
> small packets by configuration? Or is Titan’s cassandrathrift storage
> backend to blame for this?
> >
> >
> > Thanks in advance!
> > Ralf
>
>


Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Eric Stevens
Large numbers of tables is generally recommended against.  Each table has a
fixed on-heap memory overhead, and by your description it sounds like you
might have as many as 12,000 total tables when you start running into
trouble.

With such a small heap to begin with, you've probably used up most of the
available heap just with managing the tables.  This is supported by the big
STW pauses you're observing in Cassandra's logs.

On Tue, May 24, 2016 at 11:04 AM Justin Lin <linjianfeng...@gmail.com>
wrote:

> We are exploring cassandra's limit by creating a lot of keyspaces with
> moderate number of column families (roughly 40 - 50) per keyspace and we
> have a problem after we reach certain amount of keyspaces, that cqlsh
> starts to time out when connecting to cassandra.
>
> This is our cassandra setup. We have one cassandra running locally and we
> assign 4GB memory to jvm, set the memtable_allocation_type to be onheap and
> use default memtable_heap_space_in_mb, which i believe is 2GB. The
> cassandra version is 2.1.9.
>
> So after we create more than 250 keyspaces, cqlsh starts to times out when
> connecting to cassandra in most case. (Sometimes it still can connect to
> cassandra). And from cassandra log, we can see it takes roughly 3 seconds
> to do gc when there is an incoming connection. And the gc is the only
> difference between the timeout connection and the successful connection. So
> we suspect this Stop-The-World GC might block the connection until it times
> out. This is the log that i think is relevant.
>
> INFO 20160524-060930.028882 ::  Initializing
> sandbox_20160524_t06_09_18.table1
>
> INFO 20160524-060933.908008 ::  G1 Young Generation GC in 551ms.  G1 Eden
> Space: 98112 -> 0; G1 Old Gen: 2811821584 -> 3034119696;
>
> INFO 20160524-060933.908043 ::  G1 Old Generation GC in 2631ms.  G1 Old
> Gen: 3034119696 -> 2290099032;
>
> We suspect the issue might relate to the reported bug as well:
> https://issues.apache.org/jira/browse/CASSANDRA-9291
> but not really sure about it.
>
> Sorry for the setup, so my question is
> 1) is the connection timeout related to gc or the tomestone in
> system.schame_keyspaces table?
> 2) how can we fix this problem?
> 3) I did some tests by dropping a bunch of keyspaces after timing out and
> it seems to fix the problem as i never got another time out. Is this the
> only way to fix it?
>
> Thanks a lot for your help.
>
> --
> come on
>


Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
We are exploring cassandra's limit by creating a lot of keyspaces with
moderate number of column families (roughly 40 - 50) per keyspace and we
have a problem after we reach certain amount of keyspaces, that cqlsh
starts to time out when connecting to cassandra.

This is our cassandra setup. We have one cassandra running locally and we
assign 4GB memory to jvm, set the memtable_allocation_type to be onheap and
use default memtable_heap_space_in_mb, which i believe is 2GB. The
cassandra version is 2.1.9.

So after we create more than 250 keyspaces, cqlsh starts to times out when
connecting to cassandra in most case. (Sometimes it still can connect to
cassandra). And from cassandra log, we can see it takes roughly 3 seconds
to do gc when there is an incoming connection. And the gc is the only
difference between the timeout connection and the successful connection. So
we suspect this Stop-The-World GC might block the connection until it times
out. This is the log that i think is relevant.

INFO 20160524-060930.028882 ::  Initializing
sandbox_20160524_t06_09_18.table1

INFO 20160524-060933.908008 ::  G1 Young Generation GC in 551ms.  G1 Eden
Space: 98112 -> 0; G1 Old Gen: 2811821584 -> 3034119696;

INFO 20160524-060933.908043 ::  G1 Old Generation GC in 2631ms.  G1 Old
Gen: 3034119696 -> 2290099032;

We suspect the issue might relate to the reported bug as well:
https://issues.apache.org/jira/browse/CASSANDRA-9291
but not really sure about it.

Sorry for the setup, so my question is
1) is the connection timeout related to gc or the tomestone in
system.schame_keyspaces table?
2) how can we fix this problem?
3) I did some tests by dropping a bunch of keyspaces after timing out and
it seems to fix the problem as i never got another time out. Is this the
only way to fix it?

Thanks a lot for your help.

-- 
come on


Re: Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Eric Stevens
It sounds like you're trying to build a queue in Cassandra, which is one of
the classic anti-pattern use cases for Cassandra.

You may be able to do something clever with triggers, but I highly
recommend you look at purpose-built queuing software such as Kafka to solve
this instead.

On Tue, May 24, 2016 at 9:49 AM Aaditya Vadnere  wrote:

> Hi experts,
>
> We are evaluating Cassandra as messaging infrastructure for a project.
>
> In our workflow Cassandra database will be synchronized across two nodes,
> a component will INSERT/UPDATE records on one node and another component
> (who has registered for the specific table) on second node will get
> notified of record change.
>
> The second component will then try to read the database to find out the
> specific message.
>
> Is it possible for Cassandra to support such workflow? Basically, is there
> a way for Cassandra to generate a notification anytime schema changes (so
> we can set processes to listen for schema changes). As I understand,
> polling the database periodically or database triggers might work but they
> are costly operations.
>
>
> --
> Aaditya Vadnere
>


Re: Removing a datacenter

2016-05-24 Thread Jeff Jirsa
The fundamental difference between a removenode and a decommission is which 
node(s) stream data.

In decom, the leaving node streams.
In removenode, other owners of the data stream.

If you set replication factor for that DC to 0, there’s nothing to stream, so 
it’s irrelevant – do whichever you like.




From:  Anubhav Kale
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, May 24, 2016 at 9:03 AM
To:  "user@cassandra.apache.org"
Subject:  RE: Removing a datacenter

Sorry I should have more clear. What I meant was doing exactly what you wrote, 
but do a “removenode” instead of “decommission” to make it even faster. Will 
that have any side-effect (I think it shouldn’t) ?

 

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: Monday, May 23, 2016 4:43 PM
To: user@cassandra.apache.org
Subject: Re: Removing a datacenter

 

If you remove a node at a time, you’ll eventually end up with a single node in 
the DC you’re decommissioning which will own all of the data, and you’ll likely 
overwhelm that node.

 

It’s typically recommended that you ALTER the keyspace, remove the replication 
settings for that DC, and then you can decommission (and they won’t need to 
stream nearly as much, since they no longer own that data – decom will go much 
faster).

 

 

 

From: Anubhav Kale
Reply-To: "user@cassandra.apache.org"
Date: Monday, May 23, 2016 at 4:41 PM
To: "user@cassandra.apache.org"
Subject: Removing a datacenter

 

Hello,

 

Suppose we have 2 DCs and we know that the data is correctly replicated in 
both. In such situation, is it safe to “remove” one of the DCs by simply doing 
a “nodetool remove node” followed by “nodetool removenode force” for each node 
in that DC (instead of doing a “nodetool decommission” and waiting for it to 
finish) ?

 

Can someone confirm this won’t have any odd side-effects ?

 

Thanks !



smime.p7s
Description: S/MIME cryptographic signature


RE: Removing a datacenter

2016-05-24 Thread Anubhav Kale
Sorry I should have more clear. What I meant was doing exactly what you wrote, 
but do a “removenode” instead of “decommission” to make it even faster. Will 
that have any side-effect (I think it shouldn’t) ?

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Monday, May 23, 2016 4:43 PM
To: user@cassandra.apache.org
Subject: Re: Removing a datacenter

If you remove a node at a time, you’ll eventually end up with a single node in 
the DC you’re decommissioning which will own all of the data, and you’ll likely 
overwhelm that node.

It’s typically recommended that you ALTER the keyspace, remove the replication 
settings for that DC, and then you can decommission (and they won’t need to 
stream nearly as much, since they no longer own that data – decom will go much 
faster).



From: Anubhav Kale
Reply-To: "user@cassandra.apache.org"
Date: Monday, May 23, 2016 at 4:41 PM
To: "user@cassandra.apache.org"
Subject: Removing a datacenter

Hello,

Suppose we have 2 DCs and we know that the data is correctly replicated in 
both. In such situation, is it safe to “remove” one of the DCs by simply doing 
a “nodetool remove node” followed by “nodetool removenode force” for each node 
in that DC (instead of doing a “nodetool decommission” and waiting for it to 
finish) ?

Can someone confirm this won’t have any odd side-effects ?

Thanks !


Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Aaditya Vadnere
Hi experts,

We are evaluating Cassandra as messaging infrastructure for a project.

In our workflow Cassandra database will be synchronized across two nodes, a
component will INSERT/UPDATE records on one node and another component (who
has registered for the specific table) on second node will get notified of
record change.

The second component will then try to read the database to find out the
specific message.

Is it possible for Cassandra to support such workflow? Basically, is there
a way for Cassandra to generate a notification anytime schema changes (so
we can set processes to listen for schema changes). As I understand,
polling the database periodically or database triggers might work but they
are costly operations.

-- 
Aaditya Vadnere


Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Yes - a UUID is just a 128 bit value. You can view it using any base or
format.

If you are looking at the same row, you should see the same 128 bit value,
otherwise my theory is incorrect :)

Cheers,
ml

On Tue, May 24, 2016 at 6:57 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi Michael,
>
> Thank you for the quick reply.
> So you are suggesting to convert this int value(UUID comes back as int via
> Spark SQL) to hex?
>
>
> And selection is just a example to highlight the UUID convertion issue.
> So in Cassandra it should be
> SELECT id, workflow FROM sam WHERE dept='blah';
>
> And in Spark with Python:
> SELECT distinct id, dept, workflow FROM samd WHERE dept='blah';
>
>
> Best,
> Rajesh R
>
>
> --
> *From:* Laing, Michael [michael.la...@nytimes.com]
> *Sent:* 24 May 2016 11:40
> *To:* user@cassandra.apache.org
> *Subject:* Re: UUID coming as int while using SPARK SQL
>
> Try converting that int from decimal to hex and inserting dashes in the
> appropriate spots - or go the other way.
>
> Also, you are looking at different rows, based upon your selection
> criteria...
>
> ml
>
> On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan <
> rajesh.radhakrish...@phe.gov.uk
> 
> > wrote:
>
>> Hi,
>>
>>
>> I got a Cassandra keyspace, but while reading the data(especially UUID)
>> via Spark SQL using Python is not returning the correct value.
>>
>> Cassandra:
>> --
>> My table 'SAM'' is described below:
>>
>> CREATE table ks.sam (id uuid, dept text, workflow text, type double
>> primary  key (id, dept))
>>
>> SELECT id, workflow FROM sam WHERE dept='blah';
>>
>> The above example  CQL gives me the following
>> id   | workflow
>> --+
>>  9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK
>>
>>
>> Spark/Python:
>> --
>> from pyspark import SparkConf
>> from pyspark.sql import SQLContext
>> import pyspark_cassandra
>> from pyspark_cassandra import CassandraSparkContext
>>
>> 
>> conf =
>> SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
>> sparkContext = CassandraSparkContext(conf = conf)
>> sqlContext = SQLContext(sparkContext)
>>
>> samTable =sparkContext.cassandraTable("ks", "sam").select('id', 'dept','
>> workflow')
>> samTable.cache()
>>
>> samdf.registerTempTable("samd")
>>
>>  sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE workflow
>> ='testWK'
>>  new_df = sqlContext.sql(sparkSQLl)
>>  results  =  new_df.collect()
>>  for row in results:
>> print "dept=",row.dept
>> print "wk=",row.workflow
>> print "id=",row.id
>> 
>> ...
>> The Python code above prints the following:
>> dept=Biology
>> wk=testWK
>> id=293946894141093607334963674332192894528
>>
>>
>> You can see here that the id (uuid) whose correct value at Cassandra is '
>> 9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int
>> '29394689414109360733496367433219289452'.
>> What I am doing wrong here? How to get the correct UUID value from
>> Cassandra via Spark/Python ? Please help me.
>>
>> Thank you
>> Rajesh R
>>
>> **
>> The information contained in the EMail and any attachments is
>> confidential and intended solely and for the attention and use of the named
>> addressee(s). It may not be disclosed to any other person without the
>> express authority of Public Health England, or the intended recipient, or
>> both. If you are not the intended recipient, you must not disclose, copy,
>> distribute or retain this message or any part of it. This footnote also
>> confirms that this EMail has been swept for computer viruses by
>> Symantec.Cloud, but please re-sweep any attachments before opening or
>> saving. http://www.gov.uk/PHE
>> 
>> **
>>
>
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before 

RE: UUID coming as int while using SPARK SQL

2016-05-24 Thread Rajesh Radhakrishnan
Hi Michael,

Thank you for the quick reply.
So you are suggesting to convert this int value(UUID comes back as int via 
Spark SQL) to hex?


And selection is just a example to highlight the UUID convertion issue.
So in Cassandra it should be
SELECT id, workflow FROM sam WHERE dept='blah';

And in Spark with Python:
SELECT distinct id, dept, workflow FROM samd WHERE dept='blah';


Best,
Rajesh R



From: Laing, Michael [michael.la...@nytimes.com]
Sent: 24 May 2016 11:40
To: user@cassandra.apache.org
Subject: Re: UUID coming as int while using SPARK SQL

Try converting that int from decimal to hex and inserting dashes in the 
appropriate spots - or go the other way.

Also, you are looking at different rows, based upon your selection criteria...

ml

On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan 
>
 wrote:
Hi,


I got a Cassandra keyspace, but while reading the data(especially UUID) via 
Spark SQL using Python is not returning the correct value.

Cassandra:
--
My table 'SAM'' is described below:

CREATE table ks.sam (id uuid, dept text, workflow text, type double primary  
key (id, dept))

SELECT id, workflow FROM sam WHERE dept='blah';

The above example  CQL gives me the following
id   | workflow
--+
 9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK


Spark/Python:
--
from pyspark import SparkConf
from pyspark.sql import SQLContext
import pyspark_cassandra
from pyspark_cassandra import CassandraSparkContext


conf = 
SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
sparkContext = CassandraSparkContext(conf = conf)
sqlContext = SQLContext(sparkContext)

samTable =sparkContext.cassandraTable("ks", "sam").select('id', 
'dept','workflow')
samTable.cache()

samdf.registerTempTable("samd")

 sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE 
workflow='testWK'
 new_df = sqlContext.sql(sparkSQLl)
 results  =  new_df.collect()
 for row in results:
print "dept=",row.dept
print "wk=",row.workflow
print 
"id=",row.id
...
The Python code above prints the following:
dept=Biology
wk=testWK
id=293946894141093607334963674332192894528


You can see here that the id (uuid) whose correct value at Cassandra is ' 
9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int 
'29394689414109360733496367433219289452'.
What I am doing wrong here? How to get the correct UUID value from Cassandra 
via Spark/Python ? Please help me.

Thank you
Rajesh R

**
The information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of Public 
Health England, or the intended recipient, or both. If you are not the intended 
recipient, you must not disclose, copy, distribute or retain this message or 
any part of it. This footnote also confirms that this EMail has been swept for 
computer viruses by Symantec.Cloud, but please re-sweep any attachments before 
opening or saving. 
http://www.gov.uk/PHE
**


**
The information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of Public 
Health England, or the intended recipient, or both. If you are not the intended 
recipient, you must not disclose, copy, distribute or retain this message or 
any part of it. This footnote also confirms that this EMail has been swept for 
computer viruses by Symantec.Cloud, but please re-sweep any attachments before 
opening or saving. http://www.gov.uk/PHE
**

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Try converting that int from decimal to hex and inserting dashes in the
appropriate spots - or go the other way.

Also, you are looking at different rows, based upon your selection
criteria...

ml

On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi,
>
>
> I got a Cassandra keyspace, but while reading the data(especially UUID)
> via Spark SQL using Python is not returning the correct value.
>
> Cassandra:
> --
> My table 'SAM'' is described below:
>
> CREATE table ks.sam (id uuid, dept text, workflow text, type double
> primary  key (id, dept))
>
> SELECT id, workflow FROM sam WHERE dept='blah';
>
> The above example  CQL gives me the following
> id   | workflow
> --+
>  9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK
>
>
> Spark/Python:
> --
> from pyspark import SparkConf
> from pyspark.sql import SQLContext
> import pyspark_cassandra
> from pyspark_cassandra import CassandraSparkContext
>
> 
> conf =
> SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
> sparkContext = CassandraSparkContext(conf = conf)
> sqlContext = SQLContext(sparkContext)
>
> samTable =sparkContext.cassandraTable("ks", "sam").select('id', 'dept','
> workflow')
> samTable.cache()
>
> samdf.registerTempTable("samd")
>
>  sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE workflow='
> testWK'
>  new_df = sqlContext.sql(sparkSQLl)
>  results  =  new_df.collect()
>  for row in results:
> print "dept=",row.dept
> print "wk=",row.workflow
> print "id=",row.id
> ...
> The Python code above prints the following:
> dept=Biology
> wk=testWK
> id=293946894141093607334963674332192894528
>
>
> You can see here that the id (uuid) whose correct value at Cassandra is '
> 9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int '
> 29394689414109360733496367433219289452'.
> What I am doing wrong here? How to get the correct UUID value from
> Cassandra via Spark/Python ? Please help me.
>
> Thank you
> Rajesh R
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **
>


UUID coming as int while using SPARK SQL

2016-05-24 Thread Rajesh Radhakrishnan
Hi,


I got a Cassandra keyspace, but while reading the data(especially UUID) via 
Spark SQL using Python is not returning the correct value.

Cassandra:
--
My table 'SAM'' is described below:

CREATE table ks.sam (id uuid, dept text, workflow text, type double primary  
key (id, dept))

SELECT id, workflow FROM sam WHERE dept='blah';

The above example  CQL gives me the following
id   | workflow
--+
 9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK


Spark/Python:
--
from pyspark import SparkConf
from pyspark.sql import SQLContext
import pyspark_cassandra
from pyspark_cassandra import CassandraSparkContext


conf = 
SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
sparkContext = CassandraSparkContext(conf = conf)
sqlContext = SQLContext(sparkContext)

samTable =sparkContext.cassandraTable("ks", "sam").select('id', 
'dept','workflow')
samTable.cache()

samdf.registerTempTable("samd")

 sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE 
workflow='testWK'
 new_df = sqlContext.sql(sparkSQLl)
 results  =  new_df.collect()
 for row in results:
print "dept=",row.dept
print "wk=",row.workflow
print "id=",row.id
...
The Python code above prints the following:
dept=Biology
wk=testWK
id=293946894141093607334963674332192894528


You can see here that the id (uuid) whose correct value at Cassandra is ' 
9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int 
'29394689414109360733496367433219289452'.
What I am doing wrong here? How to get the correct UUID value from Cassandra 
via Spark/Python ? Please help me.

Thank you
Rajesh R

**
The information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of Public 
Health England, or the intended recipient, or both. If you are not the intended 
recipient, you must not disclose, copy, distribute or retain this message or 
any part of it. This footnote also confirms that this EMail has been swept for 
computer viruses by Symantec.Cloud, but please re-sweep any attachments before 
opening or saving. http://www.gov.uk/PHE
**

Re: cqlsh problem

2016-05-24 Thread joseph gao
I used to think it's firewall/network issues too. So I make ufw to be
inactive. I really don't what's the reason.

2016-05-09 19:01 GMT+08:00 kurt Greaves :

> Don't be fooled, despite saying tcp6 and :::*, it still listens on IPv4.
> As far as I'm aware this happens on all 2.1 Cassandra nodes, and may just
> be an oddity of netstat. It would be unrelated to your connection timeout
> issues, that's most likely related to firewall/network issues.
>
> On 9 May 2016 at 09:59, joseph gao  wrote:
>
>> It doesn't work ,still using ipv6 [image: 内嵌图片 1]
>>
>> And I already set [image: 内嵌图片 2]
>>
>> Now I'm using 4.1.1 using 9160 port instead of 5.x.x。
>>
>> Hopefully this could be resolved, Thanks!
>>
>> 2016-03-30 22:13 GMT+08:00 Alain RODRIGUEZ :
>>
>>> Hi Joseph,
>>>
>>> why cassandra using tcp6 for 9042 port like :
 tcp6   0  0 0.0.0.0:9042:::*
  LISTEN

>>>
>>> if I remember correctly, in 2.1 and higher, cqlsh uses native transport,
>>> port 9042  (instead of thrift port 9160) and your clients (if any) are also
>>> probably using native transport (port 9042). So yes, this could be an issue
>>> indeed.
>>>
>>> You should have something like:
>>>
>>> tcp0  0  1.2.3.4:9042   :::*LISTEN
>>>
>>> You are using IPv6 and no rpc address. Try setting it to the listen
>>> address and using IPv4.
>>>
>>> C*heers,
>>>
>>> ---
>>>
>>> Alain Rodriguez - al...@thelastpickle.com
>>>
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>>
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-30 6:09 GMT+02:00 joseph gao :
>>>
 why cassandra using tcp6 for 9042 port like :
 tcp6   0  0 0.0.0.0:9042:::*
  LISTEN
 would this be the problem

 2016-03-30 11:34 GMT+08:00 joseph gao :

> still have not fixed it . cqlsh: error: no such option:
> --connect-timeout
> cqlsh version 5.0.1
>
>
>
> 2016-03-25 16:46 GMT+08:00 Alain RODRIGUEZ :
>
>> Hi Joseph.
>>
>> As I can't reproduce here, I believe you are having network issue of
>> some kind.
>>
>> MacBook-Pro:~ alain$ cqlsh --version
>> cqlsh 5.0.1
>> MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh
>> --connect-timeout=5 --request-timeout=10
>> system_traces  system
>> MacBook-Pro:~ alain$
>>
>> It's been a few days, did you manage to fix it ?
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-03-21 9:59 GMT+01:00 joseph gao :
>>
>>> cqlsh version 5.0.1. nodetool tpstats looks good, log looks good.
>>> And I used specified port 9042. And it immediately returns fail (less 
>>> than
>>> 3 seconds). By the way where should I use '--connect-timeout', cqlsh 
>>> seems
>>> don't have such parameters.
>>>
>>> 2016-03-18 17:29 GMT+08:00 Alain RODRIGUEZ :
>>>
 Is the node fully healthy or rejecting some requests ?

 What are the outputs for "grep -i "ERROR"
 /var/log/cassandra/system.log" and "nodetool tpstats"?

 Any error? Any pending / blocked or dropped messages?

 Also did you try using distinct ports (9160 for thrift, 9042 for
 native) - out of curiosity, not sure this will help.

 What is your version of cqlsh "cqlsh --version" ?

 doesn't work most times. But some time it just work fine
>

 Do you fill like this is due to a timeout (query being too big,
 cluster being to busy)? Try setting this higher:

 --connect-timeout=CONNECT_TIMEOUT

 Specify the connection timeout in seconds
 (default: 5 seconds).

   --request-timeout=REQUEST_TIMEOUT

 Specify the default request timeout in
 seconds (default: 10 seconds).

 C*heers,
 ---
 Alain Rodriguez - al...@thelastpickle.com
 France

 The Last Pickle - Apache Cassandra Consulting
 http://www.thelastpickle.com

 2016-03-18 4:49 GMT+01:00 joseph gao :

> Of course yes.
>
> 2016-03-17 22:35 GMT+08:00 Vishwas Gupta <
> vishwas.gu...@snapdeal.com>:
>
>> Have you started the Cassandra service?
>>
>> sh cassandra
>> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ" 
>> wrote:
>>
>>> Hi, did you try with the address of the node 

Re: sstableloader: Stream failed

2016-05-24 Thread Ralf Steppacher
Thanks for the hint! Indeed I could not telnet to the host. It was the 
listen_address that was not properly configured.

Thanks again!
Ralf


> On 23.05.2016, at 21:01, Paulo Motta  wrote:
> 
> Can you telnet 10.211.55.8 7000? This is the port used for streaming 
> communication with the destination node.
> 
> If not you should check what is the configured storage_port in the 
> destination node and set that in the cassandra.yaml of the source node so 
> it's picked up by sstableloader.
>