Re: Internal Handling of Map Updates

2016-05-25 Thread kurt Greaves
Literally just encountered this exact same thing. I couldn't find anything
in the official docs related to this but there is at least this blog that
explains it:
http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html
and this entry in ScyllaDB's documentation:
http://www.scylladb.com/kb/sstable-interpretation/
Can confirm what Tyler mentioned, updating a single element does not cause
a tombstone.

On 25 May 2016 at 15:37, Tyler Hobbs <ty...@datastax.com> wrote:

> If you replace an entire collection, whether it's a map, set, or list, a
> range tombstone will be inserted followed by the new collection.  If you
> only update a single element, no tombstones are generated.
>
> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi,
>>
>> we have a table with a Map Field. We do not delete anything in this
>> table, but to updates on the values including the Map Field (most of the
>> time a new value for an existing key, Rarely adding new keys). We now
>> encounter a huge amount of thumbstones for this Table.
>>
>> We used sstable2json to take a look into the sstables:
>>
>>
>> {"key": "Betty_StoreCatalogLines:7",
>>
>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>
>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>> 08:40Z",1463820040628001],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>
>> . . .
>>
>>   
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>> Code\":\"276\"}}",1463820040628001]
>>
>>
>>
>> Looking at the SStables it seem like every update of a value in a Map
>> breaks down to a delete and insert in the corresponding SSTable (see all
>> the thumbstone flags „t“ in the extract of sstable2json above).
>>
>> We are using Cassandra 2.2.5.
>>
>> Can you confirm this behavior?
>>
>> Thanks!
>> --
>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>> 172.1702676
>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>> www.more4fi.de
>>
>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>> Schütz
>>
>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>> E-Mail ist nicht gestattet
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Interesting use case

2016-06-10 Thread kurt Greaves
Sorry, I did mean larger number of rows per partition.

On 9 June 2016 at 10:12, John Thomas <jthom...@gmail.com> wrote:

> The example I gave was for when N=1, if we need to save more values I
> planned to just add more columns.
>
> On Thu, Jun 9, 2016 at 12:51 AM, kurt Greaves <k...@instaclustr.com>
> wrote:
>
>> I would say it's probably due to a significantly larger number of
>> partitions when using the overwrite method - but really you should be
>> seeing similar performance unless one of the schemas ends up generating a
>> lot more disk IO.
>> If you're planning to read the last N values for an event at the same
>> time the widerow schema would be better, otherwise reading N events using
>> the overwrite schema will result in you hitting N partitions. You really
>> need to take into account how you're going to read the data when you design
>> a schema, not only how many writes you can push through.
>>
>> On 8 June 2016 at 19:02, John Thomas <jthom...@gmail.com> wrote:
>>
>>> We have a use case where we are storing event data for a given system
>>> and only want to retain the last N values.  Storing extra values for some
>>> time, as long as it isn’t too long, is fine but never less than N.  We
>>> can't use TTLs to delete the data because we can't be sure how frequently
>>> events will arrive and could end up losing everything.  Is there any built
>>> in mechanism to accomplish this or a known pattern that we can follow?  The
>>> events will be read and written at a pretty high frequency so the solution
>>> would have to be performant and not fragile under stress.
>>>
>>>
>>>
>>> We’ve played with a schema that just has N distinct columns with one
>>> value in each but have found overwrites seem to perform much poorer than
>>> wide rows.  The use case we tested only required we store the most recent
>>> value:
>>>
>>>
>>>
>>> CREATE TABLE eventyvalue_overwrite(
>>>
>>> system_name text,
>>>
>>> event_name text,
>>>
>>> event_time timestamp,
>>>
>>> event_value blob,
>>>
>>> PRIMARY KEY (system_name,event_name))
>>>
>>>
>>>
>>> CREATE TABLE eventvalue_widerow (
>>>
>>> system_name text,
>>>
>>> event_name text,
>>>
>>> event_time timestamp,
>>>
>>> event_value blob,
>>>
>>> PRIMARY KEY ((system_name, event_name), event_time))
>>>
>>> WITH CLUSTERING ORDER BY (event_time DESC)
>>>
>>>
>>>
>>> We tested it against the DataStax AMI on EC2 with 6 nodes, replication
>>> 3, write consistency 2, and default settings with a write only workload and
>>> got 190K/s for wide row and 150K/s for overwrite.  Thinking through the
>>> write path it seems the performance should be pretty similar, with probably
>>> smaller sstables for the overwrite schema, can anyone explain the big
>>> difference?
>>>
>>>
>>>
>>> The wide row solution is more complex in that it requires a separate
>>> clean up thread that will handle deleting the extra values.  If that’s the
>>> path we have to follow we’re thinking we’d add a bucket of some sort so
>>> that we can delete an entire partition at a time after copying some values
>>> forward, on the assumption that deleting the whole partition is much better
>>> than deleting some slice of the partition.  Is that true?  Also, is there
>>> any difference between setting a really short ttl and doing a delete?
>>>
>>>
>>>
>>> I know there are a lot of questions in there but we’ve been going back
>>> and forth on this for a while and I’d really appreciate any help you could
>>> give.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> John
>>>
>>
>>
>>
>> --
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Interesting use case

2016-06-10 Thread kurt Greaves
woops was obviously tired, what I said clearly doesn't make sense.

On 10 June 2016 at 14:52, kurt Greaves <k...@instaclustr.com> wrote:

> Sorry, I did mean larger number of rows per partition.
>
> On 9 June 2016 at 10:12, John Thomas <jthom...@gmail.com> wrote:
>
>> The example I gave was for when N=1, if we need to save more values I
>> planned to just add more columns.
>>
>> On Thu, Jun 9, 2016 at 12:51 AM, kurt Greaves <k...@instaclustr.com>
>> wrote:
>>
>>> I would say it's probably due to a significantly larger number of
>>> partitions when using the overwrite method - but really you should be
>>> seeing similar performance unless one of the schemas ends up generating a
>>> lot more disk IO.
>>> If you're planning to read the last N values for an event at the same
>>> time the widerow schema would be better, otherwise reading N events using
>>> the overwrite schema will result in you hitting N partitions. You really
>>> need to take into account how you're going to read the data when you design
>>> a schema, not only how many writes you can push through.
>>>
>>> On 8 June 2016 at 19:02, John Thomas <jthom...@gmail.com> wrote:
>>>
>>>> We have a use case where we are storing event data for a given system
>>>> and only want to retain the last N values.  Storing extra values for some
>>>> time, as long as it isn’t too long, is fine but never less than N.  We
>>>> can't use TTLs to delete the data because we can't be sure how frequently
>>>> events will arrive and could end up losing everything.  Is there any built
>>>> in mechanism to accomplish this or a known pattern that we can follow?  The
>>>> events will be read and written at a pretty high frequency so the solution
>>>> would have to be performant and not fragile under stress.
>>>>
>>>>
>>>>
>>>> We’ve played with a schema that just has N distinct columns with one
>>>> value in each but have found overwrites seem to perform much poorer than
>>>> wide rows.  The use case we tested only required we store the most recent
>>>> value:
>>>>
>>>>
>>>>
>>>> CREATE TABLE eventyvalue_overwrite(
>>>>
>>>> system_name text,
>>>>
>>>> event_name text,
>>>>
>>>> event_time timestamp,
>>>>
>>>> event_value blob,
>>>>
>>>> PRIMARY KEY (system_name,event_name))
>>>>
>>>>
>>>>
>>>> CREATE TABLE eventvalue_widerow (
>>>>
>>>> system_name text,
>>>>
>>>> event_name text,
>>>>
>>>> event_time timestamp,
>>>>
>>>> event_value blob,
>>>>
>>>> PRIMARY KEY ((system_name, event_name), event_time))
>>>>
>>>> WITH CLUSTERING ORDER BY (event_time DESC)
>>>>
>>>>
>>>>
>>>> We tested it against the DataStax AMI on EC2 with 6 nodes, replication
>>>> 3, write consistency 2, and default settings with a write only workload and
>>>> got 190K/s for wide row and 150K/s for overwrite.  Thinking through the
>>>> write path it seems the performance should be pretty similar, with probably
>>>> smaller sstables for the overwrite schema, can anyone explain the big
>>>> difference?
>>>>
>>>>
>>>>
>>>> The wide row solution is more complex in that it requires a separate
>>>> clean up thread that will handle deleting the extra values.  If that’s the
>>>> path we have to follow we’re thinking we’d add a bucket of some sort so
>>>> that we can delete an entire partition at a time after copying some values
>>>> forward, on the assumption that deleting the whole partition is much better
>>>> than deleting some slice of the partition.  Is that true?  Also, is there
>>>> any difference between setting a really short ttl and doing a delete?
>>>>
>>>>
>>>>
>>>> I know there are a lot of questions in there but we’ve been going back
>>>> and forth on this for a while and I’d really appreciate any help you could
>>>> give.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread kurt Greaves
What version of Cassandra are you using? Also what command are you using to
run the rebuilds? Are you using vnodes?

On 13 June 2016 at 09:01, Fabien Rousseau <fabifab...@gmail.com> wrote:

> Hello,
>
> We've tested adding a new DC from an existing DC having 3 nodes and RF=3
> (ie all nodes have all data).
> During the rebuild process, only one node of the first DC streamed data to
> the 3 nodes of the second DC.
>
> Our goal is to minimise the time it takes to rebuild a DC and would like
> to be able to stream from all nodes.
>
> Starting C* with debug logs, it appears that all nodes, when computing
> their "streaming plan" returns the same node for all ranges.
> This is probably because all nodes in DC2 have the same view of the ring.
>
> I understand that when bootstrapping a new node, it's preferable to stream
> from the node being replaced, but when rebuilding a new DC, it should
> probably select sources "randomly" (rather than always selecting the same
> source for a specific range).
> What do you think ?
>
> Best Regards,
> Fabien
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Internal Handling of Map Updates

2016-06-01 Thread kurt Greaves
01:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","0154d265c6b0",1463820040628001],
>>>>
>>>>
>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>>> Code\":\"276\"}}",1463820040628001]
>>>>
>>>>
>>>>
>>>> Looking at the SStables it seem like every update of a value in a Map
>>>> breaks down to a delete and insert in the corresponding SSTable (see all
>>>> the thumbstone flags „t“ in the extract of sstable2json above).
>>>>
>>>> We are using Cassandra 2.2.5.
>>>>
>>>> Can you confirm this behavior?
>>>>
>>>> Thanks!
>>>> --
>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>>> 172.1702676
>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>> www.more4fi.de
>>>>
>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>> Schütz
>>>>
>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>> E-Mail ist nicht gestattet
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>>
>>
>> --
>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>> 172.1702676
>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>> www.more4fi.de
>>
>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>> Schütz
>>
>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>> E-Mail ist nicht gestattet
>>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Efficiently filtering results directly in CS

2016-04-08 Thread kurt Greaves
If you're using C* 3.0 you can probably achieve this with UDFs.
http://www.planetcassandra.org/blog/user-defined-functions-in-cassandra-3-0/

On 9 April 2016 at 00:22, Kevin Burton <bur...@spinn3r.com> wrote:

> Ha..  Yes... C*...  I guess I need something like coprocessors in
> bigtable.
>
> On Fri, Apr 8, 2016 at 1:49 AM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> c* I suppose
>>
>> 2016-04-07 19:30 GMT+02:00 Jonathan Haddad <j...@jonhaddad.com>:
>>
>>> What is CS?
>>>
>>> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton <bur...@spinn3r.com> wrote:
>>>
>>>> I have a paging model whereby we stream data from CS by fetching
>>>> 'pages' thereby reading (sequentially) entire datasets.
>>>>
>>>> We're using the bucket approach where we write data for 5 minutes, then
>>>> we can just fetch the bucket for that range.
>>>>
>>>> Our app now has TONS of data and we have a piece of middleware that
>>>> filters it based on the client requests.
>>>>
>>>> So if they only want english they just get english and filter away
>>>> about 60% of our data.
>>>>
>>>> but it doesn't support condition pushdown.  So ALL this data has to be
>>>> sent from our CS boxes to our middleware and filtered there (wasting a lot
>>>> of network IO).
>>>>
>>>> Is there away (including refactoring the code) that I could push this
>>>> this into CS?  Maybe some way I could discovery the CS topology and put
>>>> daemons on each of our CS boxes and fetch from CS directly (doing the
>>>> filtering there).
>>>>
>>>> Thoughts?
>>>>
>>>> --
>>>>
>>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>>> Engineers!
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread kurt Greaves
Not necessarily considering RF is 2 so both nodes should have all
partitions. Luke, are you sure the repair is succeeding? You don't have
other keyspaces/duplicate data/extra data in your cassandra data directory?
Also, you could try querying on the node with less data to confirm if it
has the same dataset.

On 24 May 2016 at 22:03, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> For the other DC, it can be acceptable because partition reside on one
> node, so say  if you have a large partition, it may skew things a bit.
> On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:
>
>> So I guess the problem may have been with the initial addition of the
>> 10.128.0.20 node because when I added it in it never synced data I
>> guess?  It was at around 50 MB when it first came up and transitioned to
>> "UN". After it was in I did the 1->2 replication change and tried repair
>> but it didn't fix it.  From what I can tell all the data on it is stuff
>> that has been written since it came up.  We never delete data ever so we
>> should have zero tombstones.
>>
>> If I am not mistaken, only two of my nodes actually have all the data,
>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>> is almost a GB lower and then of course 10.128.0.20 which is missing
>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>> didn't fix either one.
>>
>> Am I running into a bug of some kind?
>>
>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>
>>> Hi Luke,
>>>
>>> You mentioned that replication factor was increased from 1 to 2. In that
>>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>>
>>> You can run nodetool repair with option -local to initiate repair local
>>> datacenter for gce-us-central1.
>>>
>>> Also you may suspect that if a lot of data was deleted while the node
>>> was down it may be having a lot of tombstones which is not needed to be
>>> replicated to the other node. In order to verify the same, you can issue a
>>> select count(*) query on column families (With the amount of data you have
>>> it should not be an issue) with tracing on and with consistency local_all
>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>> file. It will give you a fair amount of idea about how many deleted cells
>>> the nodes have. I tried searching for reference if tombstones are moved
>>> around during repair, but I didnt find evidence of it. However I see no
>>> reason to because if the node didnt have data then streaming tombstones
>>> does not make a lot of sense.
>>>
>>> Regards,
>>> Bhuvan
>>>
>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com>
>>> wrote:
>>>
>>>> Here's my setup:
>>>>
>>>> Datacenter: gce-us-central1
>>>> ===
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>>>   Rack
>>>> UN  10.128.0.3   6.4 GB 256  100.0%
>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>> UN  10.128.0.20  943.08 MB  256  100.0%
>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>> Datacenter: gce-us-east1
>>>> 
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address  Load   Tokens   Owns (effective)  Host ID
>>>>   Rack
>>>> UN  10.142.0.14  6.4 GB 256      100.0%
>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>> UN  10.142.0.13  5.55 GB256  100.0%
>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>
>>>> And my replication settings are:
>>>>
>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>
>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
>>>> of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
>>>> Also 10.142.0.13 seems also not to have everything as it only has a
>>>> load of 5.55 GB.
>>>>
>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com>
>>>> wrote:
>>>

Re: Increasing replication factor and repair doesn't seem to work

2016-05-23 Thread kurt Greaves
Do you have 1 node in each DC or 2? If you're saying you have 1 node in
each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
up is?

On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote:

> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns" for
> the node switched to 100% as it should but the Load showed that it didn't
> actually sync the data.  I then ran a full 'nodetool repair' and it didn't
> fix it still.  This scares me as I thought 'nodetool repair' was a way to
> assure consistency and that all the nodes were synced but it doesn't seem
> to be.  Outside of that command, I have no idea how I would assure all the
> data was synced or how to get the data correctly synced without
> decommissioning the node and re-adding it.
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: cqlsh problem

2016-05-09 Thread kurt Greaves
ry setting this higher:
>>>>>>>
>>>>>>> --connect-timeout=CONNECT_TIMEOUT
>>>>>>>
>>>>>>> Specify the connection timeout in seconds
>>>>>>> (default: 5 seconds).
>>>>>>>
>>>>>>>   --request-timeout=REQUEST_TIMEOUT
>>>>>>>
>>>>>>> Specify the default request timeout in
>>>>>>> seconds (default: 10 seconds).
>>>>>>>
>>>>>>> C*heers,
>>>>>>> ---
>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>> France
>>>>>>>
>>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>> 2016-03-18 4:49 GMT+01:00 joseph gao <gaojf.bok...@gmail.com>:
>>>>>>>
>>>>>>>> Of course yes.
>>>>>>>>
>>>>>>>> 2016-03-17 22:35 GMT+08:00 Vishwas Gupta <
>>>>>>>> vishwas.gu...@snapdeal.com>:
>>>>>>>>
>>>>>>>>> Have you started the Cassandra service?
>>>>>>>>>
>>>>>>>>> sh cassandra
>>>>>>>>> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ" <arodr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi, did you try with the address of the node rather than 127.0.0.1
>>>>>>>>>>
>>>>>>>>>> Is the transport protocol used by cqlsh (not sure if it is thrift
>>>>>>>>>> or binary - native in 2.1)  active ? What is the "nodetool info" 
>>>>>>>>>> output ?
>>>>>>>>>>
>>>>>>>>>> C*heers,
>>>>>>>>>> ---
>>>>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>>>>> France
>>>>>>>>>>
>>>>>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>
>>>>>>>>>> 2016-03-17 14:26 GMT+01:00 joseph gao <gaojf.bok...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> hi, all
>>>>>>>>>>> cassandra version 2.1.7
>>>>>>>>>>> When I use cqlsh to connect cassandra, something is wrong
>>>>>>>>>>>
>>>>>>>>>>> Connection error: ( Unable to connect to any servers',
>>>>>>>>>>> {'127.0.0.1': OperationTimedOut('errors=None, last_host=None,)})
>>>>>>>>>>>
>>>>>>>>>>> This happens lots of times, but sometime it works just fine.
>>>>>>>>>>> Anybody knows why?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Joseph Gao
>>>>>>>>>>> PhoneNum:15210513582
>>>>>>>>>>> QQ: 409343351
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Joseph Gao
>>>>>>>> PhoneNum:15210513582
>>>>>>>> QQ: 409343351
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Joseph Gao
>>>>>> PhoneNum:15210513582
>>>>>> QQ: 409343351
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Joseph Gao
>>>> PhoneNum:15210513582
>>>> QQ: 409343351
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Joseph Gao
>>> PhoneNum:15210513582
>>> QQ: 409343351
>>>
>>
>>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: nodetool repair with -pr and -dc

2016-08-11 Thread kurt Greaves
-D does not do what you think it does. I've quoted the relevant
documentation from the README:

>
> <https://github.com/BrianGallew/cassandra_range_repair#multiple-datacenters>Multiple
> Datacenters
>
> If you have multiple datacenters in your ring, then you MUST specify the
> name of the datacenter containing the node you are repairing as part of the
> command-line options (--datacenter=DCNAME). Failure to do so will result in
> only a subset of your data being repaired (approximately
> data/number-of-datacenters). This is because nodetool has no way to
> determine the relevant DC on its own, which in turn means it will use the
> tokens from every ring member in every datacenter.
>


On 11 August 2016 at 12:24, Paulo Motta <pauloricard...@gmail.com> wrote:

> > if we want to use -pr option ( which i suppose we should to prevent
> duplicate checks) in 2.0 then if we run the repair on all nodes in a single
> DC then it should be sufficient and we should not need to run it on all
> nodes across DC's?
>
> No, because the primary ranges of the nodes in other DCs will be missing
> repair, so you should either run with -pr in all nodes in all DCs, or
> restrict repair to a specific DC with -local (and have duplicate checks).
> Combined -pr and -local are only supported on 2.1
>
>
> 2016-08-11 1:29 GMT-03:00 Anishek Agarwal <anis...@gmail.com>:
>
>> ok thanks, so if we want to use -pr option ( which i suppose we should to
>> prevent duplicate checks) in 2.0 then if we run the repair on all nodes in
>> a single DC then it should be sufficient and we should not need to run it
>> on all nodes across DC's ?
>>
>>
>>
>> On Wed, Aug 10, 2016 at 5:01 PM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> On 2.0 repair -pr option is not supported together with -local, -hosts
>>> or -dc, since it assumes you need to repair all nodes in all DCs and it
>>> will throw and error if you try to run with nodetool, so perhaps there's
>>> something wrong with range_repair options parsing.
>>>
>>> On 2.1 it was added support to simultaneous -pr and -local options on
>>> CASSANDRA-7450, so if you need that you can either upgade to 2.1 or
>>> backport that to 2.0.
>>>
>>>
>>> 2016-08-10 5:20 GMT-03:00 Anishek Agarwal <anis...@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>> We have 2.0.17 cassandra cluster(*DC1*) with a cross dc setup with a
>>>> smaller cluster(*DC2*).  After reading various blogs about
>>>> scheduling/running repairs looks like its good to run it with the following
>>>>
>>>>
>>>> -pr for primary range only
>>>> -st -et for sub ranges
>>>> -par for parallel
>>>> -dc to make sure we can schedule repairs independently on each Data
>>>> centre we have.
>>>>
>>>> i have configured the above using the repair utility @
>>>> https://github.com/BrianGallew/cassandra_range_repair.git
>>>>
>>>> which leads to the following command :
>>>>
>>>> ./src/range_repair.py -k [keyspace] -c [columnfamily name] -v -H
>>>> localhost -p -D* DC1*
>>>>
>>>> but looks like the merkle tree is being calculated on nodes which are
>>>> part of other *DC2.*
>>>>
>>>> why does this happen? i thought it should only look at the nodes in
>>>> local cluster. however on nodetool the* -pr* option cannot be used
>>>> with *-local* according to docs @https://docs.datastax.com/en/
>>>> cassandra/2.0/cassandra/tools/toolsRepair.html
>>>>
>>>> so i am may be missing something, can someone help explain this please.
>>>>
>>>> thanks
>>>> anishek
>>>>
>>>
>>>
>>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread kurt greaves
On 30 January 2017 at 04:43, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> But how I will tell rebuild command source DC if I have more than 2 Dc?



You will need to rebuild the new DC from at least one DC for every keyspace
present on the new DC and the old DC's.
For example, if you have 2 DC's A, B, and add a new DC "C", with keyspace
"X" replicated to A and C, keyspace "Y" replicated to B and C, you will
need to rebuild the nodes from "C" from both DC's A and B, otherwise they
will not stream a full set of data for both keyspaces.

If all your keyspaces are replicated to all DC's, you only need to rebuild
from one other DC (which one doesn't *really* matter).

Note that if you rebuild multiple times on a node you will end up with
duplicate data. This isn't an issue, compactions will clean it up over
time. Usually if a rebuild fails for any reason you should wipe the data
directory to ensure you don't end up with 2 copies of a lot of the data.


Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread kurt greaves
Marketing never lies. Ever


Re: UnknownColumnFamilyException after removing all Cassandra data

2017-02-07 Thread kurt greaves
The node is trying to communicate with another node, potentially streaming
data, and is receiving files/data for an "unknown column family". That is,
it doesn't know about the CF with the id e36415b6-95a7-368c-9ac0-
ae0ac774863d.
If you deleted some columnfamilies but not all the system keyspace and
restarted the node I'd expect this error to occur. Or I suppose if you
didn't decommission the node properly before blowing the data away and
restarting.

You'll have to give us more information on what your exact steps were on
this 2nd node:

When you say deleted all Cassandra data, did this include the system
tables? Were your steps to delete all the data and then just restart the
node? Did you remove the node from the cluster prior to deleting the data
and restarting it (nodetool decommission/removenode? Did the node rejoin
the cluster or did it have to bootstrap?


Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-27 Thread kurt greaves
What Dikang said, in your original email you are passing -dc to rebuild.
This is incorrect. Simply run nodetool rebuild  from each of the
nodes in the new dc.

On 28 Jan 2017 07:50, "Dikang Gu"  wrote:

> Have you run 'nodetool rebuild dc_india' on the new nodes?
>
> On Tue, Jan 24, 2017 at 7:51 AM, Benjamin Roth 
> wrote:
>
>> Have you also altered RF of system_distributed as stated in the tutorial?
>>
>> 2017-01-24 16:45 GMT+01:00 Abhishek Kumar Maheshwari <
>> abhishek.maheshw...@timesinternet.in>:
>>
>>> My Mistake,
>>>
>>>
>>>
>>> Both clusters are up and running.
>>>
>>>
>>>
>>> Datacenter: DRPOCcluster
>>>
>>> 
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  AddressLoad   Tokens   OwnsHost
>>> ID   Rack
>>>
>>> UN  172.29.XX.XX  1.65 GB   256  ?
>>> badf985b-37da-4735-b468-8d3a058d4b60  01
>>>
>>> UN  172.29.XX.XX  1.64 GB   256  ?
>>> 317061b2-c19f-44ba-a776-bcd91c70bbdd  03
>>>
>>> UN  172.29.XX.XX  1.64 GB   256  ?
>>> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c  02
>>>
>>> Datacenter: dc_india
>>>
>>> 
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  AddressLoad   Tokens   OwnsHost
>>> ID   Rack
>>>
>>> UN  172.26.XX.XX   79.90 GB   256  ?
>>> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>>>
>>> UN  172.26.XX.XX   80.21 GB   256  ?
>>> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>>>
>>>
>>>
>>> *Thanks & Regards,*
>>> *Abhishek Kumar Maheshwari*
>>> *+91- 805591 <+91%208%2005591> (Mobile)*
>>>
>>> Times Internet Ltd. | A Times of India Group Company
>>>
>>> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>>>
>>> *P** Please do not print this email unless it is absolutely necessary.
>>> Spread environmental awareness.*
>>>
>>>
>>>
>>> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
>>> *Sent:* Tuesday, January 24, 2017 9:11 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
>>> new Cluster
>>>
>>>
>>>
>>> I am not an expert in bootstrapping new DCs but shouldn't the OLD nodes
>>> appear as UP to be used as a streaming source in rebuild?
>>>
>>>
>>>
>>> 2017-01-24 16:32 GMT+01:00 Abhishek Kumar Maheshwari <
>>> abhishek.maheshw...@timesinternet.in>:
>>>
>>> Yes, I take all steps. While I am inserting new data is replicating on
>>> both DC. But only old data is not replication in new cluster.
>>>
>>>
>>>
>>> *Thanks & Regards,*
>>> *Abhishek Kumar Maheshwari*
>>> *+91- 805591 <+91%208%2005591> (Mobile)*
>>>
>>> Times Internet Ltd. | A Times of India Group Company
>>>
>>> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>>>
>>> *P** Please do not print this email unless it is absolutely necessary.
>>> Spread environmental awareness.*
>>>
>>>
>>>
>>> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
>>> *Sent:* Tuesday, January 24, 2017 8:55 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
>>> new Cluster
>>>
>>>
>>>
>>> There is much more to it than just changing the RF in the keyspace!
>>>
>>>
>>>
>>> See here: https://docs.datastax.com/en/cassandra/3.0/cassandra/o
>>> perations/opsAddDCToCluster.html
>>>
>>>
>>>
>>> 2017-01-24 16:18 GMT+01:00 Abhishek Kumar Maheshwari <
>>> abhishek.maheshw...@timesinternet.in>:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I have Cassandra stack with 2 Dc
>>>
>>>
>>>
>>> Datacenter: DRPOCcluster
>>>
>>> 
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  AddressLoad   Tokens   OwnsHost
>>> ID   Rack
>>>
>>> UN  172.29.xx.xxx  256  MB   256  ?
>>> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a  01
>>>
>>> UN  172.29.xx.xxx  240 MB   256  ?
>>> 604abbf5-8639-4104-8f60-fd6573fb2e17  03
>>>
>>> UN  172.29. xx.xxx  240 MB   256  ?
>>> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
>>>
>>> Datacenter: dc_india
>>>
>>> 
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  AddressLoad   Tokens   OwnsHost
>>> ID   Rack
>>>
>>> DN  172.26. .xx.xxx  78.97 GB   256  ?
>>> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>>>
>>> DN  172.26. .xx.xxx  79.18 GB   256  ?
>>> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>>>
>>>
>>>
>>> dc_india is old Dc which contains all data.
>>>
>>> I update keyspace as per below:
>>>
>>>
>>>
>>> alter KEYSPACE wls WITH replication = {'class':
>>> 'NetworkTopologyStrategy', 'DRPOCcluster': '2','dc_india':'2'}  AND
>>> durable_writes = true;
>>>
>>>
>>>
>>> but old data is not updating in DRPOCcluster(which is new). Also, while
>>> running nodetool rebuild getting below exception:
>>>

Re: Re : Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

2017-01-27 Thread kurt greaves
we've seen this issue on a few clusters, including on 2.1.7 and 2.1.8.
pretty sure it is an issue in gossip that's known about. in later versions
it seems to be fixed.

On 24 Jan 2017 06:09, "sai krishnam raju potturi" 
wrote:

> In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or
> datacenter, we observe the decommissioned nodes marked as DOWN in the
> cluster when you do a "nodetool describecluster". The nodes however do not
> show up in the "nodetool status" command.
> The decommissioned node also does not show up in the "system_peers" table
> on the nodes.
>
> The workaround we follow is rolling restart of the cluster, which removes
> the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual
> state of the cluster. The workaround is tedious for huge clusters.
>
> We also verified the decommission process in CCM tool, and observed the
> same issue for clusters with versions from 2.1.12 to 2.1.16. The issue was
> not observed in versions prior to or later than the ones mentioned above.
>
>
> Has anybody in the community observed similar issue? We've also raised a
> JIRA issue regarding this.   https://issues.apache.org/jira
> /browse/CASSANDRA-13144
>
>
> Below are the observed logs from the versions without the bug, and with
> the bug.  The one's highlighted in yellow show the expected logs. The one's
> highlighted in red are the one's where the node is recognized as down, and
> shows as UNREACHABLE.
>
>
>
> Cassandra 2.1.1 Logs showing the decommissioned node :  (Without the bug)
>
> 2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring
> interval time of 2049943233 <(204)%20994-3233> for /X.X.X.X
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node
> /X.X.X.X state left, tokens [ 59353109817657926242901533144729725259,
> 60254520910109313597677907197875221475, 
> 75698727618038614819889933974570742305,
> 84508739091270910297310401957975430578]
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time
> for endpoint : /X.X.X.X (1485116334088)
> 2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing
> tokens [100434964734820719895982857900842892337,
> 114144647582686041354301802358217767299, 
> 13209060517964702932350041942412177,
> 138409460913927199437556572481804704749] for /X.X.X.X
> 2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager
> Deleting any stored hints for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting
> version for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint
> /X.X.X.X
> 2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring
> state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection
> attempting to connect to /X.X.X.X
> 2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection
> Handshaking version with /X.X.X.X
> 2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting
> version 7 for /X.X.X.X
> 2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring
> interval time of 2074454222 <(207)%20445-4222> for /X.X.X.X
> 2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring
> interval time of 4302985797 <(430)%20298-5797> for /X.X.X.X
> 2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 6 elapsed,
> /X.X.X.X gossip quarantine over
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring
> interval time of 3047826501 <(304)%20782-6501> for /X.X.X.X
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring
> state change for dead or unknown endpoint: /X.X.X.X
>
>
> Cassandra 2.1.16 Logs showing the decommissioned node :   (The logs in
> 2.1.16 show the same as 2.1.1 upto "DEBUG Gossiper 6 elapsed, /X.X.X.X
> gossip quarantine over", and then is followed by "NODE is now DOWN"
>
> 017-01-19 19:52:23,687 [GossipStage:1] DEBUG StorageService.java:1883 -
> Node /X.X.X.X state left, tokens [-1112888759032625467,
> -228773855963737699, -311455042375
> 4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066,
> 1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764
> <(916)%20688-5764>, 8654747784805453046]
> 2017-01-19 19:52:23,688 [GossipStage:1] DEBUG Gossiper.java:1520 - adding
> expire time for endpoint : /X.X.X.X (1485114743567)
> 2017-01-19 19:52:23,688 [GossipStage:1] INFO StorageService.java:1965 -
> Removing tokens [-1112888759032625467, -228773855963737699,
> -3114550423754381391, -48486259449
> 49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547,
> 7191120402564284381, 7278843689020594771, 7565410054791352413,
> 8524200089166885764, 865474778
> 4805453046 <(480)%20545-3046>] for /X.X.X.X
> 2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO
> HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
> 2017-01-19 19:52:23,689 

Re: Time series data model and tombstones

2017-01-29 Thread kurt greaves
Your partitioning key is text. If you have multiple entries per id you are
likely hitting older cells that have expired. Descending only affects how
the data is stored on disk, if you have to read the whole partition to find
whichever time you are querying for you could potentially hit tombstones in
other SSTables that contain the same "id". As mentioned previously, you
need to add a time bucket to your partitioning key and definitely use
DTCS/TWCS.


Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
typically when I've seen that gossip issue it requires more than just
restarting the affected node to fix. if you're not getting query related
errors in the server log you should start looking at what is being queried.
are the queries that time out each day the same?


Re: Count(*) is not working

2017-02-17 Thread kurt greaves
really... well that's good to know. it still almost never works though. i
guess every time I've seen it it must have timed out due to tombstones.

On 17 Feb. 2017 22:06, "Sylvain Lebresne" <sylv...@datastax.com> wrote:

On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves <k...@instaclustr.com> wrote:

> if you want a reliable count, you should use spark. performing a count (*)
> will inevitably fail unless you make your server read timeouts and
> tombstone fail thresholds ridiculous
>

That's just not true. count(*) is paged internally so while it is not
particular fast, it shouldn't require bumping neither the read timeout nor
the tombstone fail threshold in any way to work.

In that case, it seems the partition does have many tombstones (more than
live rows) and so the tombstone threshold is doing its job of warning about
it.


>
> On 17 Feb. 2017 04:34, "Jan" <j...@dafuer.de> wrote:
>
>> Hi,
>>
>> could you post the output of nodetool cfstats for the table?
>>
>> Cheers,
>>
>> Jan
>>
>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>
>> I am not getting count as result. Where i keep on getting n number of
>> results below.
>>
>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>> LIMIT 100 (see tombstone_warn_threshold)
>>
>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten <j...@dafuer.de> wrote:
>>
>>> Hi,
>>>
>>> do you got a result finally?
>>>
>>> Those messages are simply warnings telling you that c* had to read many
>>> tombstones while processing your query - rows that are deleted but not
>>> garbage collected/compacted. This warning gives you some explanation why
>>> things might be much slower than expected because per 100 rows that count
>>> c* had to read about 15 times rows that were deleted already.
>>>
>>> Apart from that, count(*) is almost always slow - and there is a default
>>> limit of 10.000 rows in a result.
>>>
>>> Do you really need the actual live count? To get a idea you can always
>>> look at nodetool cfstats (but those numbers also contain deleted rows).
>>>
>>>
>>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>>
>>> Hi,
>>>
>>> I want to know the total records count in table.
>>>
>>> I fired the below query:
>>>select count(*) from tablename;
>>>
>>> and i have got the below output
>>>
>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>> LIMIT 100 (see tombstone_warn_threshold)
>>>
>>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>>> tombstone_warn_threshold)
>>>
>>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>>> tombstone_warn_threshold).
>>>
>>>
>>>
>>>
>>> Can you please help me to get the total count of the table.
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>>
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>>
>>


Re: High disk io read load

2017-02-17 Thread kurt greaves
what's the Owns % for the relevant keyspace from nodetool status?


Re: Which compaction strategy when modeling a dumb set

2017-02-24 Thread kurt greaves
Probably LCS although what you're implying (read before write) is an
anti-pattern in Cassandra. Something like this is a good indicator that you
should review your model.
​


Re: Read exceptions after upgrading to 3.0.10

2017-02-24 Thread kurt greaves
That stacktrace generally implies your clients are resetting connections.
The reconnection policy probably handles the issue automatically, however
worth investigating. I don't think it normally causes statuslogger output
however, what were the log messages prior to the stacktrace?

On 24 February 2017 at 11:57, Carlos Rolo  wrote:

> By any chances are you using the PHP/C++ driver?
>
> --
>
>
>
>


Re: High disk io read load

2017-02-24 Thread kurt greaves
How many CFs are we talking about here? Also, did the script also kick off
the scrubs or was this purely from changing the schemas?
​


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-13 Thread kurt greaves
are people actually trying to imply that Google is less evil than oracle?
what is this shill fest

On 12 Feb. 2017 8:24 am, "Kant Kodali"  wrote:

Saw this one today...

https://news.ycombinator.com/item?id=13624062

On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans 
wrote:

> On Mon, Jan 2, 2017 at 2:26 PM, Edward Capriolo 
> wrote:
> > Lets be clear:
> > What I am saying is avoiding being loose with the word "free"
> >
> > https://en.wikipedia.org/wiki/Free_software_license
> >
> > Many things with the JVM are free too. Most importantly it is free to
> use.
> >
> > https://www.java.com/en/download/faq/distribution.xml
> >
> > As it relates to this conversation: I am not aware of anyone running
> > Cassandra that has modified upstream JVM to make Cassandra run
> > better/differently *. Thus the license around the Oracle JVM is roughly
> > meaningless to the user/developer of cassandra.
> >
> > * The only group I know that took an action to modify upstream was Acunu.
> > They had released a modified Linux Kernel with a modified Apache
> Cassandra.
> > http://cloudtweaks.com/2011/02/data-storage-startup-acunu-ra
> ises-3-6-million-to-launch-its-first-product/.
> > That product no longer exists.
> >
> > "I don't how to read any of this.  It sounds like you're saying that a
> > JVM is something that cannot be produced as a Free Software project,"
> >
> > What I am saying is something like the JVM "could" be produced as a "free
> > software project". However, the argument that I was making is that the
> > popular viable languages/(including vms or runtime to use them) today
> > including Java, C#, Go, Swift are developed by the largest tech
> companies in
> > the world, and as such I do believe a platform would be viable.
> Specifically
> > I believe without Oracle driving Java OpenJDK would not be viable.
> >
> > There are two specific reasons.
> > 1) I do not see large costly multi-year initiatives like G1 happening
> > 2) Without guidance/leadership that sun/oracle I do not see new features
> > that change the language like lambda's and try multi-catch happening in a
> > sane way.
> >
> > I expanded upon #2 be discussing my experience with standards like c++
> 11,
> > 14,17 and attempting to take compiling working lambda code on linux GCC
> to
> > microsoft visual studio and having it not compile. In my opinion, Java
> only
> > wins because as a platform it is very portable as both source and binary
> > code. Without leadership on that front I believe that over time the
> language
> > would suffer.
>
> I realize that you're trying to be pragmatic about all of this, but
> what I don't think you realize, is that so am I.
>
> Java could change hands at any time (it has once already), or Oracle
> leadership could decide to go in a different direction.  Imagine for
> example that they relicensed it to exclude use by orientation or
> religion, Cassandra would implicitly carry these restrictions as well.
> Imagine that they decided to provide a back-door to the NSA, Cassandra
> would then also contain such a back-door.  These might sound
> hypothetical, but there is plenty of precedent here.
>
> OpenJDK benefits from the same resources and leadership from Oracle
> that you value, but is licensed and distributed in a way that
> safeguards us from a day when Oracle becomes less benevolent, (if that
> were to happen, some other giant company could assume the mantle of
> leadership).
>
> All I'm really suggesting is that we at least soften our requirement
> on the Oracle JVM, and perhaps perform some test runs in CI against
> OpenJDK.  Actively discouraging people from using the Free Software
> alternative here, one that is working well for many, isn't the
> behavior I'd normally expect from a Free Software project.
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: Count(*) is not working

2017-02-17 Thread kurt greaves
if you want a reliable count, you should use spark. performing a count (*)
will inevitably fail unless you make your server read timeouts and
tombstone fail thresholds ridiculous

On 17 Feb. 2017 04:34, "Jan"  wrote:

> Hi,
>
> could you post the output of nodetool cfstats for the table?
>
> Cheers,
>
> Jan
>
> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>
> I am not getting count as result. Where i keep on getting n number of
> results below.
>
> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
> LIMIT 100 (see tombstone_warn_threshold)
>
> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten  wrote:
>
>> Hi,
>>
>> do you got a result finally?
>>
>> Those messages are simply warnings telling you that c* had to read many
>> tombstones while processing your query - rows that are deleted but not
>> garbage collected/compacted. This warning gives you some explanation why
>> things might be much slower than expected because per 100 rows that count
>> c* had to read about 15 times rows that were deleted already.
>>
>> Apart from that, count(*) is almost always slow - and there is a default
>> limit of 10.000 rows in a result.
>>
>> Do you really need the actual live count? To get a idea you can always
>> look at nodetool cfstats (but those numbers also contain deleted rows).
>>
>>
>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>
>> Hi,
>>
>> I want to know the total records count in table.
>>
>> I fired the below query:
>>select count(*) from tablename;
>>
>> and i have got the below output
>>
>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>> LIMIT 100 (see tombstone_warn_threshold)
>>
>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>> tombstone_warn_threshold)
>>
>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>> tombstone_warn_threshold).
>>
>>
>>
>>
>> Can you please help me to get the total count of the table.
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>>
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>
>


Re: lots of connection timeouts around same time every day

2017-02-17 Thread kurt greaves
have you tried a rolling restart of the entire DC?


Re: Unreliable JMX metrics

2017-01-19 Thread kurt Greaves
Yes. You likely will still be able to see the nodes in nodetool gossipinfo


Re: Tombstoned error and then OOM

2016-10-04 Thread kurt Greaves
This sounds like you're running a query that consumes a lot of memory. Are
you by chance querying a very large partition or not bounding your query?

I'd also recommend upgrading to 2.1.15, 2.1.0 is very old and has quite a
few bugs.

On 3 October 2016 at 17:08, INDRANIL BASU <indranil...@yahoo.com> wrote:

> Hello All,
>
>
>
> I am getting the below error repeatedly in the system log of C* 2.1.0
>
> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
> test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold). 5000
> columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
> localDeletion=2147483647}
>
> After that NullPointer Exception and finally OOM
>
> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 6287,1,main]
> java.lang.NullPointerException: null
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:475)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:463)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.cache.AutoSavingCache$Writer.
> saveCache(AutoSavingCache.java:225) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.db.compaction.CompactionManager$
> 11.run(CompactionManager.java:1061) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.FutureTask.run(Unknown Source)
> ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 9712,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10070,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10413,1,main]
> java.lang.NullPointerException: null
> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
> CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter:
> 2396,5,main]
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) ~[na:1.7.0_80]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) ~[na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>
> -- IB
>
>
>
>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Tombstoned error and then OOM

2016-10-06 Thread kurt Greaves
you'll still need to query all the data even if it's secondary indexed.

On 4 October 2016 at 17:13, INDRANIL BASU <indranil...@yahoo.com> wrote:

> The query has a where clause on a column which is a secondary index in the
> column family.
> E.g
> select * from test_schema.test_cf where status = 0;
> Here the status is integer column which is indexed.
>
> -- IB
>
> ------
> *From:* kurt Greaves <k...@instaclustr.com>
> *To:* user@cassandra.apache.org; INDRANIL BASU <indranil...@yahoo.com>
> *Sent:* Tuesday, 4 October 2016 10:38 PM
> *Subject:* Re: Tombstoned error and then OOM
>
> This sounds like you're running a query that consumes a lot of memory. Are
> you by chance querying a very large partition or not bounding your query?
>
> I'd also recommend upgrading to 2.1.15, 2.1.0 is very old and has quite a
> few bugs.
>
> On 3 October 2016 at 17:08, INDRANIL BASU <indranil...@yahoo.com> wrote:
>
> Hello All,
>
>
>
> I am getting the below error repeatedly in the system log of C* 2.1.0
>
> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
> test_schema.test_cf.test_cf_ col1_idx (see tombstone_warn_threshold). 5000
> columns was requested, slices=[-], delInfo={deletedAt=-
> 9223372036854775808, localDeletion=2147483647}
>
> After that NullPointer Exception and finally OOM
>
> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 6287,1,main]
> java.lang. NullPointerException: null
> at org.apache.cassandra.service. CacheService$
> KeyCacheSerializer.serialize( CacheService.java:475)
> ~[apache-cassandra-2.1.0.jar: 2.1.0]
> at org.apache.cassandra.service. CacheService$
> KeyCacheSerializer.serialize( CacheService.java:463)
> ~[apache-cassandra-2.1.0.jar: 2.1.0]
> at org.apache.cassandra.cache. AutoSavingCache$Writer.
> saveCache(AutoSavingCache. java:225) ~[apache-cassandra-2.1.0.jar: 2.1.0]
> at org.apache.cassandra.db. compaction.CompactionManager$
> 11.run(CompactionManager.java: 1061) ~[apache-cassandra-2.1.0.jar: 2.1.0]
> at java.util.concurrent. Executors$RunnableAdapter. call(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent. FutureTask.run(Unknown Source)
> ~[na:1.7.0_80]
> at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown
> Source) [na:1.7.0_80]
> at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown
> Source) [na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 9712,1,main]
> java.lang. NullPointerException: null
> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10070,1,main]
> java.lang. NullPointerException: null
> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10413,1,main]
> java.lang. NullPointerException: null
> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
> CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter:
> 2396,5,main]
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
> at java.util.concurrent. ThreadPoolExecutor.addWorker( Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent. ThreadPoolExecutor.
> processWorkerExit(Unknown Source) ~[na:1.7.0_80]
> at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown
> Source) ~[na:1.7.0_80]
>     at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown
> Source) ~[na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>
> -- IB
>
>
>
>
>
>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Read Repairs and CL

2016-08-27 Thread kurt Greaves
Looking at the wiki for the read path (
http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom diagram
for reading with a read repair, it states the following when "reading from
all replica nodes" after there is a hash mismatch:

If hashes do not match, do conflict resolution. First step is to read all
> data from all replica nodes excluding the fastest replica (since CL=ALL)
>

 In the bottom left of the diagram it also states:

> In this example:
>
RF>=2
>
CL=ALL
>

The (since CL=ALL) implies that the CL for the read during the read repair
is based off the CL of the query. However I don't think that makes sense at
other CLs. Anyway, I just want to clarify what CL the read for the read
repair occurs at for cases where the overall query CL is not ALL.

Thanks,
Kurt.

-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: How to confirm TWCS is fully in-place

2016-11-09 Thread kurt Greaves
What compaction strategy are you migrating from? If you're migrating from
STCS it's likely that when switching to TWCS no extra compactions are
necessary, as the SSTables will be put into their respective windows but
there won't be enough candidates for compaction within a window.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 8 November 2016 at 21:11, Oskar Kjellin <oskar.kjel...@gmail.com> wrote:

> Hi,
>
> You could manually trigger it with nodetool compact.
>
> /Oskar
>
> > On 8 nov. 2016, at 21:47, Lahiru Gamathige <lah...@highfive.com> wrote:
> >
> > Hi Users,
> >
> > I am thinking of migrating our timeseries tables to use TWCS. I am using
> JMX to set the new compaction and one node at a time and I am not sure how
> to confirm that after the flush all the compaction is done in each node. I
> tried this in a small cluster but after setting the compaction I didn't see
> any compaction triggering  and ran nodetool flush and still didn't see a
> compaction triggering.
> >
> > Now I am about to do the same thing in our staging cluster, so curious
> how do I confirm compaction ran in each node before I change the table
> schema because I am worried it will start the compaction in all the nodes
> at the same time.
> >
> > Lahiru
>


Re: Introducing Cassandra 3.7 LTS

2016-10-19 Thread kurt Greaves
On 19 October 2016 at 21:07, sfesc...@gmail.com <sfesc...@gmail.com> wrote:

> Wow, thank you for doing this. This sentiment regarding stability seems to
> be widespread. Is the team reconsidering the whole tick-tock cadence? If
> not, I would add my voice to those asking that it is revisited.


There has certainly been discussion regarding the tick-tock cadence, and it
seems safe to say it will change. There hasn't been any official
announcement yet, however.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: non incremental repairs with cassandra 2.2+

2016-10-19 Thread kurt Greaves
On 19 October 2016 at 17:13, Alexander Dejanovski <a...@thelastpickle.com>
wrote:

> There aren't that many tools I know to orchestrate repairs and we maintain
> a fork of Reaper, that was made by Spotify, and handles incremental repair
> : https://github.com/thelastpickle/cassandra-reaper


Looks like you're using subranges with incremental repairs. This will
generate a lot of anticompactions as you'll only repair a portion of the
SSTables. You should use forceRepairAsync for incremental repairs so that
it's possible for the repair to act on the whole SSTable, minimising
anticompactions.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: non incremental repairs with cassandra 2.2+

2016-10-20 Thread kurt Greaves
Welp, that's good but wasn't apparent in the codebase :S.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 20 October 2016 at 05:02, Alexander Dejanovski <a...@thelastpickle.com>
wrote:

> Hi Kurt,
>
> we're not actually.
> Reaper performs full repair by subrange but does incremental repair on all
> ranges at once, node by node.
> Subrange is incompatible with incremental repair anyway.
>
> Cheers,
>
> On Thu, Oct 20, 2016 at 5:24 AM kurt Greaves <k...@instaclustr.com> wrote:
>
>>
>> On 19 October 2016 at 17:13, Alexander Dejanovski <a...@thelastpickle.com
>> > wrote:
>>
>> There aren't that many tools I know to orchestrate repairs and we
>> maintain a fork of Reaper, that was made by Spotify, and handles
>> incremental repair : https://github.com/thelastpickle/cassandra-reaper
>>
>>
>> Looks like you're using subranges with incremental repairs. This will
>> generate a lot of anticompactions as you'll only repair a portion of the
>> SSTables. You should use forceRepairAsync for incremental repairs so that
>> it's possible for the repair to act on the whole SSTable, minimising
>> anticompactions.
>>
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: time series data model

2016-10-20 Thread kurt Greaves
If event_time is timestamps since unix epoch you 1. may want to use the
in-built timestamps type, and 2. order by event_time DESC. 2 applies if you
want to do queries such as "select * from eventdata where ... and
event_time > x" (i.e; get latest events).

Other than that your model seems workable, I assume you're using DTCS/TWCS,
and aligning the time windows to your day bucket. (If not you should do
that)

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 20 October 2016 at 07:29, wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>
wrote:

> Hi All,
> I'm trying to migrate my time series data which is GPS trace from mysql to
> C*. I want a wide row to hold one day data. I designed the data model as
> below. Please help to see if there is any problem. Any suggestion is
> appreciated.
>
> Table Model:
> CREATE TABLE cargts.eventdata (
> deviceid int,
> date int,
> event_time bigint,
> position text,
> PRIMARY KEY ((deviceid, date), event_time)
> )
>
> A slice of data:
> cqlsh:cargts> SELECT * FROM eventdata WHERE deviceid =
> 186628 and date = 20160928 LIMIT 10;
>
>  deviceid | date | event_time| position
> --+--+---+--
> ---
>186628 | 20160928 | 1474992002000 |  {"latitude":
> 30.343443936386247,"longitude":120.08751351828943,"speed":41,"heading":48}
>186628 | 20160928 | 1474992012000 |   {"latitude":
> 30.34409508979662,"longitude":120.08840022183352,"speed":45,"heading":53}
>186628 | 20160928 | 1474992022000 |   {"latitude":
> 30.34461639856887,"longitude":120.08946100336443,"speed":28,"heading":65}
>186628 | 20160928 | 1474992032000 |   {"latitude":
> 30.34469478717028,"longitude":120.08973154015409,"speed":11,"heading":67}
>186628 | 20160928 | 1474992042000 |   {"latitude":
> 30.34494998929474,"longitude":120.09027263811151,"speed":19,"heading":47}
>186628 | 20160928 | 1474992052000 | {"latitude":
> 30.346057349126617,"longitude":120.08967091817931,"speed":
> 41,"heading":323}
>186628 | 20160928 | 1474992062000 |{"latitude"
> :30.346997145708,"longitude":120.08883508853253,"speed":52,"heading":323}
>186628 | 20160928 | 1474992072000 | {"latitude":
> 30.348131044340988,"longitude":120.08774702315581,"speed":
> 65,"heading":321}
>186628 | 20160928 | 1474992082000 | {"latitude":
> 30.349438164412838,"longitude":120.08652612959328,"speed":
> 68,"heading":322}
>
> -Simon Wu
>


Re: non incremental repairs with cassandra 2.2+

2016-10-20 Thread kurt Greaves
probably because i was looking the wrong version of the codebase :p


Re: time series data model

2016-10-20 Thread kurt Greaves
Ah didn't pick up on that but looks like he's storing JSON within position.
Is there any strong reason for this or as Vladimir mentioned can you store
the fields under "position" in separate columns?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 20 October 2016 at 08:17, Vladimir Yudovin <vla...@winguzone.com> wrote:

> Hi Simon,
>
> Why *position *is text and not float? Text takes much more place.
> Also speed and headings can be calculated basing on latest positions, so
> you can also save them. If you really need it in data base you can save
> them as floats, or compose single float value like speed.heading: 41.173
> (or opposite, heading.speed) and save column storage overhead.
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 20 Oct 2016 03:29:16 -0400*<wxn...@zjqunshuo.com
> <wxn...@zjqunshuo.com>>* wrote 
>
> Hi All,
> I'm trying to migrate my time series data which is GPS trace from mysql to
> C*. I want a wide row to hold one day data. I designed the data model as
> below. Please help to see if there is any problem. Any suggestion is
> appreciated.
>
> Table Model:
> CREATE TABLE cargts.eventdata (
> deviceid int,
> date int,
> event_time bigint,
> position text,
> PRIMARY KEY ((deviceid, date), event_time)
> )
>
> A slice of data:
> cqlsh:cargts> SELECT * FROM eventdata WHERE deviceid =
> 186628 and date = 20160928 LIMIT 10;
>
>  deviceid | date | event_time| position
> --+--+---+--
> ---
>186628 | 20160928 | 1474992002000 |  {"latitude":
> 30.343443936386247,"longitude":120.08751351828943,"speed":41,"heading":48}
>186628 | 20160928 | 1474992012000 |   {"latitude":
> 30.34409508979662,"longitude":120.08840022183352,"speed":45,"heading":53}
>186628 | 20160928 | 1474992022000 |   {"latitude":
> 30.34461639856887,"longitude":120.08946100336443,"speed":28,"heading":65}
>186628 | 20160928 | 1474992032000 |   {"latitude":
> 30.34469478717028,"longitude":120.08973154015409,"speed":11,"heading":67}
>186628 | 20160928 | 1474992042000 |   {"latitude":
> 30.34494998929474,"longitude":120.09027263811151,"speed":19,"heading":47}
>186628 | 20160928 | 1474992052000 | {"latitude":
> 30.346057349126617,"longitude":120.08967091817931,"speed":
> 41,"heading":323}
>186628 | 20160928 | 1474992062000 |{"latitude"
> :30.346997145708,"longitude":120.08883508853253,"speed":52,"heading":323}
>186628 | 20160928 | 1474992072000 | {"latitude":
> 30.348131044340988,"longitude":120.08774702315581,"speed":
> 65,"heading":321}
>186628 | 20160928 | 1474992082000 | {"latitude":
> 30.349438164412838,"longitude":120.08652612959328,"speed":
> 68,"heading":322}
>
> -Simon Wu
>
>
>


Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
On 22 October 2016 at 03:37, Seth Edwards <s...@pubnub.com> wrote:

> We're using TWCS and we notice that if we make changes to the options to
> the window unit or size, it seems to implicitly start recompacting all
> sstables.


If you increase the window unit or size you potentially increase the number
of SSTable candidates for compaction inside each window, which is why you
would see more compactions. If you decrease the window you shouldn't see
any new compactions kicked off, however be aware that you will have
SSTables covering multiple windows, so until a full cycle of your TTL
passes your read queries won't benefit from the smaller window size.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
​More compactions meaning "actual number of compaction tasks". A compaction
task generally operates on many SSTables (how many depends on the chosen
compaction strategy). The number of pending tasks does not line up with the
number of SSTables that will be compacted. 1 task may compact many SSTables.
If your pending tasks are jumping "into the thousands" you're quite
possibly flushing data from memtables faster than you can compact them.
Ideally your pending compactions shouldn't really go above 10 (or 5 even),
and if they are you're possibly overloading the cluster.


Re: Cassandra installation best practices

2016-10-18 Thread kurt Greaves
Mehdi,

Nothing as detailed as Oracle's OFA currently exists. You can probably also
find some useful information here:
https://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAbout.html



Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 18 October 2016 at 07:38, Mehdi Bada <mehdi.b...@dbi-services.com> wrote:

> Hi Brooke,
>
> Thank you for your advices. Finally, no technical standards (provided by
> DataStax or Apache) exists for deploying Cassandra in a production
> environment?
>
> In comparison with some RDBMS (Oracle, MySQL), some standards (OFA for
> instance) exists and are provided by Oracle.
>
> Best regards
> Mehdi
>
> ---
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
> --
> *From: *"Brooke Jensen" <bro...@instaclustr.com>
> *To: *"user" <user@cassandra.apache.org>
> *Sent: *Tuesday, October 18, 2016 8:59:14 AM
> *Subject: *Re: Cassandra installation best practices
>
> Hi Mehdi,
> In addition, give some thought to your cluster topology. For maximum fault
> tolerance and availability I would recommend using at least three nodes
> with a replication factor of three. Ideally, you should also use Cassandra
> logical racks. This will reduce the risk of outage and make ongoing
> management of the cluster a lot easier.
>
>
> *Brooke Jensen*
> VP Technical Operations & Customer Services
> www.instaclustr.com | support.instaclustr.com
> <https://support.instaclustr.com/hc/en-us>
>
> This email has been sent on behalf of Instaclustr Limited (Australia) and
> Instaclustr Inc (USA). This email and any attachments may contain
> confidential and legally privileged information.  If you are not the
> intended recipient, do not copy or disclose its content, but please reply
> to this email immediately and highlight the error to the sender and then
> immediately delete the message.
>
> On 18 October 2016 at 04:02, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>
>> Hi Mehdi,
>>
>> You can refer https://docs.datastax.com/en/landing_page/doc/landing_page/
>> recommendedSettings.html .
>>
>> Thanks
>> Anuj
>>
>> On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada
>>
>> <mehdi.b...@dbi-services.com> wrote:
>> Hi all,
>>
>> It is exist some best practices when installing Cassandra on production
>> environment? Some standard to follow? For instance, the file system type
>> etc..
>>
>>
>


Re: time series data model

2016-10-24 Thread kurt Greaves
On 20 October 2016 at 09:29, wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>
wrote:

> I do need to align the time windows to day bucket to prevent one row
> become too big, and event_time is timestamp since unix epoch. If I use
> bigint as type of event_time, can I do queries as you mentioned?


Yes.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Cluster Maintenance Mishap

2016-10-24 Thread kurt Greaves
On 21 October 2016 at 15:15, Branton Davis <branton.da...@spanning.com>
wrote:

> For example, I forgot to mention until I read your comment, that the
> instances showed as UN (up, normal) instead of UJ (up, joining) while they
> were apparently bootstrapping.


It's likely these nodes were configured as seed nodes, which means they
wouldn't have bootstrapped. In this case it shouldn't have been an issue
after you fixed up the data directories.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-24 Thread kurt Greaves
On 25 October 2016 at 01:34, Ali Akhtar <ali.rac...@gmail.com> wrote:

> I want some of the newer UDT features, like not needing to have frozen UDTs


You can try Instaclustr's 3.7 LTS release which is just 3.7 with some
backported fixes from later versions. If you absolutely need those new
features it's probably your best bet (until 4.0), however note that it's
still 3.7 and likely less stable than the latest 3.0.x releases.

https://github.com/instaclustr/cassandra

Read the README at the repo for more info.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Doing an upsert into a collection?

2016-10-24 Thread kurt Greaves
On 24 October 2016 at 22:16, Ali Akhtar <ali.rac...@gmail.com> wrote:

> *UPDATE movie set ratings.rating = 5 WHERE ratings.user = 'bob'*


You won't be able to do this because you're trying to update a row without
specifying the primary key. Also, even if you did add the PK to the where,
you've specified a list of (frozen) ratings, so ratings.rating and
ratings.user doesn't make sense.

Collection types can't be part of the primary key, so updating as you've
mentioned above won't really be possible.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Thousands of SSTables generated in only one node

2016-10-25 Thread kurt Greaves
+1 definitely upgrade to 2.1.16. You shouldn't see any compatibility issues
client side when upgrading from 2.1.0. If scrub removed 500 SSTables that's
quite worrying. If the mass SSTables are causing issues you can disconnect
the node from the cluster using:
nodetool disablegossip && nodetool disablebinary && nodetool disablethrift
This will give it a chance to compact SSTables while not impacting read
performance. To be on the safe side you shouldn't have it disconnected for
longer than 3 hours (default hinted handoff window). You can re-enable it
with:
nodetool enablegossip && nodetool enablebinary && nodetool enablethrift
​


Re: Question about compaction strategy changes

2016-10-24 Thread kurt Greaves
On 24 October 2016 at 18:11, Seth Edwards <s...@pubnub.com> wrote:

> The other thought is that we currently have data mixed in that does not
> have a TTL and we are strongly considering putting this data in it's own
> table.


You should definitely do that. Having non-TTL'd data mixed in will result
in SSTables that don't expire because some small portion may be live data.
Plus mixed with the small number of compaction candidates, it could take a
long time for these types of SSTables to be compacted (possibly never).

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: How to throttle up/down compactions without a restart

2016-10-20 Thread kurt Greaves
You can throttle compactions using nodetool setcompactionthroughput .
Where x is in mbps. If you're using 2.2 or later this applies immediately
to all running compactions, otherwise it applies on any "new" compactions.
You will want to be careful of allowing compactions to utilise too much
disk bandwidth. If you're needing to alter this in peak periods you may be
starting to overload your nodes with writes, or potentially something else
is not ideal like memtables flushing too frequently.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 21 October 2016 at 04:41, Thomas Julian <thomasjul...@zoho.com> wrote:

> Hello,
>
>
> I was going through this
> <http://www.slideshare.net/IvanBurmistrov1/digging-cassandra-cluster-53234249>
> presentation and the Slide-55 caught my attention.
>
> i.e) "Throttled down compactions during high load period, throttled up
> during low load period"
>
> Can we throttle down compactions without a restart?
>
> If this can be done, what are all the parameters(JMX?) to work with? How
> to implement this for below Compaction Strategies.
>
>1. Size Tiered Compaction Strategy.
>2. Leveled Compaction Strategy
>
> Any help is much appreciated.
>
> Best Regards,
> Julian.
>
>
>
>
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread kurt Greaves
On 20 October 2016 at 20:58, Branton Davis <branton.da...@spanning.com>
wrote:

> Would they have taken on the token ranges of the original nodes or acted
> like new nodes and got new token ranges?  If the latter, is it possible
> that any data moved from the healthy nodes to the "new" nodes or
> would restarting them with the original data (and repairing) put
> the cluster's token ranges back into a normal state?


It sounds like you stopped them before they completed joining. So you
should have nothing to worry about. If not, you will see them marked as DN
from other nodes in the cluster. If you did, they wouldn't have assumed the
token ranges and you shouldn't have any issues.

You can just copy the original data back (including system tables) and they
should assume their own ranges again, and then you can repair to fix any
missing replicas.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
dclocal_read_repair_chance and read_repair_chance are only really relevant
when using a consistency level  wrote:

> Hi Kurt,
>
> Thank you for the suggestion. I ran repair on all the 4 nodes, and after
> the repair, the error “Corrupt empty row found in unfiltered partition”
> disappeared, but the “Mismatch” stopped for a little while and came up
> again.
>
> When we changed both the “dclocal_read_repair_chance” and the
> “read_repair_chance” to 0.0, the “Mismatch” stopped. Is it OK to do that?
> Does it mean when the inconsistence found in reading data, Cassandra
> wouldn’t do the repair and we will just get the inconsistent data? And you
> said the cause is not all replicas receiving all the writes, I think it is
> reasonable but the strange thing is I didn’t notice any failed writing ,
> another cause I can think of is there are insert, update, delete on the
> same record at the same time , is it a possibility?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Wednesday, November 23, 2016 6:51 AM
> *To:* Pan, Adeline (TR Technology & Ops)
> *Cc:* user@cassandra.apache.org
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Yes it could potentially impact performance if there are lots of them. The
> mismatch would occur on a read, the error occurs on a write which is why
> the times wouldn't line up. The cause for the messages as I mentioned is
> when there is a digest mismatch between replicas. The cause is inconsistent
> deta/not all replicas receiving all writes. You should run a repair and see
> if the number of mismatches is reduced.
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instaclustr.com=CwMFaQ=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q=552xSDXEzKpvsyZM5wpE0TGEUDzVsX35L-K72hRjpLc=8lqiPNb8HiRlBNyddnGZahh0KiP-7P0MfAnjUHI0c84=E6-7Hti1G8DXfJZttqNy6gwGb56o65eS5Zhjm4deFFk=>
>
>
>
> On 22 November 2016 at 06:30, <adeline@thomsonreuters.com> wrote:
>
> Hi Kurt,
>
> Thank you for the information, but the error “Corrupt empty row found in
> unfiltered partition” seems not related to the “Mismatch”; the time they
> occurred didn’t match. We use “QUORUM” consistency level for both read and
> write and I didn’t notice any failed writing in the log. Any other cause
> you can think of?  Would it cause performance issue when lots of this
> “Mismatch” happened?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Monday, November 21, 2016 5:02 PM
> *To:* user@cassandra.apache.org
> *Cc:* tommy.stend...@ericsson.com
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Actually, just saw the error message in those logs and what you're looking
> at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D12694=CwMFaQ=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q=552xSDXEzKpvsyZM5wpE0TGEUDzVsX35L-K72hRjpLc=Km5uRGlDf2EjQFx7dbIrLzNfL6khh5OKA2sJk59l8-w=tMf24yohd0jRGCBo_pzYdRMw52h3NCImPOGXjy1SAsc=>
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instaclustr.com=CwMFaQ=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q=552xSDXEzKpvsyZM5wpE0TGEUDzVsX35L-K72hRjpLc=Km5uRGlDf2EjQFx7dbIrLzNfL6khh5OKA2sJk59l8-w=TwJ80glB0cSS0rW6jU1MGnlLWUtVYL1J7061vp2e_rI=>
>
>
>
> On 21 November 2016 at 08:59, kurt Greaves <k...@instaclustr.com> wrote:
>
> That's a debug message. From the sound of it, it's triggered on read where
> there is a digest mismatch between replicas. As to whether it's normal,
> well that depends on your cluster. Are the nodes reporting lots of dropped
> mutations and are you writing at <QUORUM?
>
>
>
>
>


Re: Is it *safe* to issue multiple replace-node at the same time?

2016-11-21 Thread kurt Greaves
On 21 November 2016 at 18:58, Ben Bromhead <b...@instaclustr.com> wrote:

> Same rack and no range movements, my first instinct is to say yes it is
> safe (I like to treat racks as one giant meta node). However I would want
> to have a read through the replace code.


This is assuming RF<=# of racks as well (and NTS).

Kurt Greaves
www.instaclustr.com


Re: lots of DigestMismatchException in cassandra3

2016-11-22 Thread kurt Greaves
Yes it could potentially impact performance if there are lots of them. The
mismatch would occur on a read, the error occurs on a write which is why
the times wouldn't line up. The cause for the messages as I mentioned is
when there is a digest mismatch between replicas. The cause is inconsistent
deta/not all replicas receiving all writes. You should run a repair and see
if the number of mismatches is reduced.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 22 November 2016 at 06:30, <adeline@thomsonreuters.com> wrote:

> Hi Kurt,
>
> Thank you for the information, but the error “Corrupt empty row found in
> unfiltered partition” seems not related to the “Mismatch”; the time they
> occurred didn’t match. We use “QUORUM” consistency level for both read and
> write and I didn’t notice any failed writing in the log. Any other cause
> you can think of?  Would it cause performance issue when lots of this
> “Mismatch” happened?
>
>
>
> --
>
> Regards, Adeline
>
>
>
>
>
>
>
> *From:* kurt Greaves [mailto:k...@instaclustr.com]
> *Sent:* Monday, November 21, 2016 5:02 PM
> *To:* user@cassandra.apache.org
> *Cc:* tommy.stend...@ericsson.com
> *Subject:* Re: lots of DigestMismatchException in cassandra3
>
>
>
> Actually, just saw the error message in those logs and what you're looking
> at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D12694=CwMFaQ=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q=552xSDXEzKpvsyZM5wpE0TGEUDzVsX35L-K72hRjpLc=Km5uRGlDf2EjQFx7dbIrLzNfL6khh5OKA2sJk59l8-w=tMf24yohd0jRGCBo_pzYdRMw52h3NCImPOGXjy1SAsc=>
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instaclustr.com=CwMFaQ=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q=552xSDXEzKpvsyZM5wpE0TGEUDzVsX35L-K72hRjpLc=Km5uRGlDf2EjQFx7dbIrLzNfL6khh5OKA2sJk59l8-w=TwJ80glB0cSS0rW6jU1MGnlLWUtVYL1J7061vp2e_rI=>
>
>
>
> On 21 November 2016 at 08:59, kurt Greaves <k...@instaclustr.com> wrote:
>
> That's a debug message. From the sound of it, it's triggered on read where
> there is a digest mismatch between replicas. As to whether it's normal,
> well that depends on your cluster. Are the nodes reporting lots of dropped
> mutations and are you writing at <QUORUM?
>
>
>


RE: lots of DigestMismatchException in cassandra3

2016-11-21 Thread kurt Greaves
That's a debug message. From the sound of it, it's triggered on read where
there is a digest mismatch between replicas. As to whether it's normal,
well that depends on your cluster. Are the nodes reporting lots of dropped
mutations and are you writing at 

Re: lots of DigestMismatchException in cassandra3

2016-11-21 Thread kurt Greaves
Actually, just saw the error message in those logs and what you're looking
at is probably https://issues.apache.org/jira/browse/CASSANDRA-12694



Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 21 November 2016 at 08:59, kurt Greaves <k...@instaclustr.com> wrote:

> That's a debug message. From the sound of it, it's triggered on read where
> there is a digest mismatch between replicas. As to whether it's normal,
> well that depends on your cluster. Are the nodes reporting lots of dropped
> mutations and are you writing at <QUORUM?
>


Re: Incremental repairs leading to unrepaired data

2016-10-31 Thread kurt Greaves
Blowing out to 1k SSTables seems a bit full on. What args are you passing
to repair?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 31 October 2016 at 09:49, Stefano Ortolani <ostef...@gmail.com> wrote:

> I've collected some more data-points, and I still see dropped
> mutations with compaction_throughput_mb_per_sec set to 8.
> The only notable thing regarding the current setup is that I have
> another keyspace (not being repaired though) with really wide rows
> (100MB per partition), but that shouldn't have any impact in theory.
> Nodes do not seem that overloaded either and don't see any GC spikes
> while those mutations are dropped :/
>
> Hitting a dead end here, any further idea where to look for further ideas?
>
> Regards,
> Stefano
>
> On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani <ostef...@gmail.com>
> wrote:
> > That's what I was thinking. Maybe GC pressure?
> > Some more details: during anticompaction I have some CFs exploding to 1K
> > SStables (to be back to ~200 upon completion).
> > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
> > relying on spinning disks, with ~150GB per node.
> > Current version is 3.0.8.
> >
> >
> > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta <pauloricard...@gmail.com>
> > wrote:
> >>
> >> That's pretty low already, but perhaps you should lower to see if it
> will
> >> improve the dropped mutations during anti-compaction (even if it
> increases
> >> repair time), otherwise the problem might be somewhere else. Generally
> >> dropped mutations is a signal of cluster overload, so if there's nothing
> >> else wrong perhaps you need to increase your capacity. What version are
> you
> >> in?
> >>
> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani <ostef...@gmail.com>:
> >>>
> >>> Not yet. Right now I have it set at 16.
> >>> Would halving it more or less double the repair time?
> >>>
> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta <pauloricard...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Anticompaction throttling can be done by setting the usual
> >>>> compaction_throughput_mb_per_sec knob on cassandra.yaml or via
> nodetool
> >>>> setcompactionthroughput. Did you try lowering that  and checking if
> that
> >>>> improves the dropped mutations?
> >>>>
> >>>> 2016-08-09 13:32 GMT-03:00 Stefano Ortolani <ostef...@gmail.com>:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I am running incremental repaird on a weekly basis (can't do it every
> >>>>> day as one single run takes 36 hours), and every time, I have at
> least one
> >>>>> node dropping mutations as part of the process (this almost always
> during
> >>>>> the anticompaction phase). Ironically this leads to a system where
> repairing
> >>>>> makes data consistent at the cost of making some other data not
> consistent.
> >>>>>
> >>>>> Does anybody know why this is happening?
> >>>>>
> >>>>> My feeling is that this might be caused by anticompacting column
> >>>>> families with really wide rows and with many SStables. If that is
> the case,
> >>>>> any way I can throttle that?
> >>>>>
> >>>>> Thanks!
> >>>>> Stefano
> >>>>
> >>>>
> >>>
> >>
> >
>


Re: cluster creating problem due to same cluster name

2016-10-26 Thread kurt Greaves
github, not JIRA...

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 26 October 2016 at 09:36, kurt Greaves <k...@instaclustr.com> wrote:

> you probably should raise this as an issue on their JIRA. (I assume you're
> using TLP's fork: https://github.com/thelastpickle/cassandra-reaper)
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
> On 26 October 2016 at 06:51, Abhishek Aggarwal <
> abhishek.aggarwa...@snapdeal.com> wrote:
>
>>
>> Not able to create new cluster with existing name in reaper with diff
>> seed. As per code firstly using the jmx cluster name is fetched and looked
>> into DB if the cluster with same name exists or not.
>>
>> My point is if the seed ip is different then it should allow to create
>> the new cluster.
>>
>>
>>
>


Re: cluster creating problem due to same cluster name

2016-10-26 Thread kurt Greaves
you probably should raise this as an issue on their JIRA. (I assume you're
using TLP's fork: https://github.com/thelastpickle/cassandra-reaper)

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 26 October 2016 at 06:51, Abhishek Aggarwal <
abhishek.aggarwa...@snapdeal.com> wrote:

>
> Not able to create new cluster with existing name in reaper with diff
> seed. As per code firstly using the jmx cluster name is fetched and looked
> into DB if the cluster with same name exists or not.
>
> My point is if the seed ip is different then it should allow to create the
> new cluster.
>
>
>


Re: Rebuilding with vnodes

2016-11-02 Thread kurt Greaves
If the network and both DC's can handle the load it's fine (the new DC
would . You'll want to keep an eye on the logs for streaming failures, as
it's not always completely clear and you could end up with missing data.
You should definitely be aware that rebuilds affect the source DC, so if
it's under load you want to be careful of impacting it.

I'm not sure that memtable_cleanup_threshold affects streamed SSTables,
seems unlikely that the streamed SSTables would also be added to memtables,
however obviously your DC would be receiving writes simultaneously. 0.7
seems quite high, what are your heap settings and memtable_flush_writers?

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com

On 2 November 2016 at 19:59, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Hello,
>
>
>
> I am trying to rebuild a new Data Center with 50 Nodes, and expect 1 TB /
> node. Nodes are backed by SSDs, and the rebuild is happening from another
> DC in same physical region. This is with 2.1.13.
>
>
>
> I am doing this with stream_throughput=200 MB, concurrent_compactors=256,
> compactionthroughput=0, and memtable_cleanup_threshold=0.7. (memtable
> setting was necessary to keep # SSTable files in check) and running rebuild
> 20 nodes at a time.
>
>
>
> Have people generally attempted to do such large rebuilds ? Any tips ?
>
>
>
> Thanks !
>
>
>


Re: Backup restore with a different name

2016-11-03 Thread kurt Greaves
On 2 November 2016 at 22:10, Jens Rantil <jens.ran...@tink.se> wrote:

> I mean "exposing that state for reference while keeping the (corrupt)
> current state in the live cluster".


The following should work:


   1. Create a new table with the same schema but different name (in the
   same or a different keyspace).
   2. Rename all the snapshotted SSTables to match the *new* table name.
   3. Copy SSTables into new table directory.
   4. nodetool refresh or restart Cassandra.


Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Incremental repairs leading to unrepaired data

2016-11-01 Thread kurt Greaves
Can't say I have too many ideas. If load is low during the repair it
shouldn't be happening. Your disks aren't overutilised correct? No other
processes writing loads of data to them?


Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-13 Thread kurt Greaves
Don't do pr repairs when using incremental repair, you'll just end up with
loads of anti-compactions.

On 12 October 2016 at 19:11, Harikrishnan Pillai <hpil...@walmartlabs.com>
wrote:

> In my experience dc local repair node by node with
> Pr and par options is best .full repair increased sstables
> A lot and take days to compact it back or another
> Easy option for repair is use a spark job ,read all data with
> Consistency all and increase read repair chance to
> 100 % or use Netflix tickler
>
> Sent from my iPhone
>
> On Oct 12, 2016, at 11:44 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>
> Hi Leena,
>
> First thing you should be concerned about is : Why the repair -pr
> operation doesnt complete ?
> Second comes the question : Which repair option is best?
>
>
> One probable cause of stuck repairs is : if the firewall between DCs is
> closing TCP connections and Cassandra is trying to use such connections,
> repairs will hang. Please refer https://docs.datastax.com/en/
> cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html . We
> faced that.
>
> Also make sure you comply with basic bandwidth requirement between DCs.
> Recommended is 1000 Mb/s (1 gigabit) or greater.
>
> Answers for specific questions:
> 1.As per my understanding, all replicas will not participate in dc local
> repairs and thus repair would be ineffective. You need to make sure that
> all replicas of a data in all dcs are in sync.
>
> 2. Every DC is not a ring. All DCs together form a token ring. So, I think
> yes you should run repair -pr on all nodes.
>
> 3. Yes. I dont have experience with incremental repairs. But you can run
> repair -pr on all nodes of all DCs.
>
> Regarding Best approach of repair, you should see some repair
> presentations of Cassandra Summit 2016. All are online now.
>
> I attended the summit and people using large clusters generally use sub
> range repairs to repair their clusters. But such large deployments are on
> older Cassandra versions and these deployments generally dont use vnodes.
> So people know easily which nodes hold which token range.
>
>
>
> Thanks
> Anuj
>
> --
> *From: *Leena Ghatpande <lghatpa...@hotmail.com>;
> *To: *user@cassandra.apache.org <user@cassandra.apache.org>;
> *Subject: *Repair in Multi Datacenter - Should you use -dc Datacenter
> repair or repair with -pr
> *Sent: *Wed, Oct 12, 2016 2:15:51 PM
>
> Please advice. Cannot find any clear documentation on what is the best
> strategy for repairing nodes on a regular basis with multiple datacenters
> involved.
>
>
> We are running cassandra 3.7 in multi datacenter with 4 nodes in each data
> center. We are trying to run repairs every other night to keep the nodes in
> good state.We currently run repair with -pr option , but the repair process
> gets hung and does not complete gracefully. Dont see any errors in the logs
> either.
>
>
> What is the best way to perform repairs on multiple data centers on large
> tables.
>
> 1. Can we run Datacenter repair using -dc option for each data center? Do
> we need to run repair on each node in that case or will it repair all nodes
> within the datacenter?
>
> 2. Is running repair with -pr across all nodes required , if we perform
> the step 1 every night?
>
> 3. Is cross data center repair required and if so whats the best option?
>
>
> Thanks
>
>
> Leena
>
>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: are there any free Cassandra -> ElasticSearch connector / plugin ?

2016-10-13 Thread kurt Greaves
New features don't necessarily restrict bugs only to those features. (only
in our dreams). Often features may touch on parts of the code that could
cause issues for other parts of the system.

To clarify, just because you don't use new features, doesn't mean you are
free from the risk of their bugs.

On 14 October 2016 at 00:23, Jonathan Haddad <j...@jonhaddad.com> wrote:

> If you're not using the features why use a release that nobody else (read:
> experienced users) is?
>
> What do you need in 3.x that's not available in 3.0?
>
> On Thu, Oct 13, 2016 at 5:23 PM Eric Ho <e...@analyticsmd.com> wrote:
>
>> But if I'm not doing anything fancy w/ C* (i.e. don't use new features in
>> 3.{2,4,6}) then I'll be fine, right ?
>>
>>
>> -eric ho
>>
>>
>> On Thu, Oct 13, 2016 at 5:09 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>> I listed my reasons, please check my previous email.
>>
>> On Thu, Oct 13, 2016 at 4:55 PM Eric Ho <e...@analyticsmd.com> wrote:
>>
>> Why 3.0.x ?  Why not use 3.2.x or 3.4.x ? or 3.6.x ?
>> Shouldn't 3.6.x be more stable than say 3.2.x ?
>>
>>
>> -eric ho
>>
>>
>> On Thu, Oct 13, 2016 at 3:48 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>> Here's your basic options:
>>
>> 1. Triggers (avoid like the plague)
>> 2. CDC (really new, tricky to avoid RF operations as is, probably avoid)
>> 3. Do it in your app
>> 4. Put Kafka in front of your data, write as many consumers as you want
>> to write the data in as many ways as you want
>>
>> Also, how long have you been using Cassandra?  Unless you're comfortable
>> rolling your own builds and merging in bugfixes from upstream, I really
>> suggest using a 3.0.x release instead of a 3.7.
>>
>> 3.7 falls under the Tick Tock release cycle, which is almost completely
>> untested in production by experienced operators.  In the cases where it
>> has
>> been tested, there have been numerous bugs found which I (and I think most
>> people on this list) consider to be show stoppers.  Additionally, the Tick
>> Tock release cycle puts the operator in the uncomfortable position of
>> having to decide between upgrading to a new version with new features
>> (probably new bugs) or back porting bug fixes from future versions
>> themselves.There will never be a 3.7.1 release which fixes bugs in 3.7
>> without adding new features.
>>
>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt
>>
>> For new projects I recommend starting with the recently released 3.0.9.
>>
>> Assuming the project changes it's policy on releases (all signs point to
>> yes), then by the time 4.0 rolls out a lot of the features which have been
>> released in the 3.x series will have matured a bit, so it's very possible
>> 4.0 will stabilize faster than the usual 6 months it takes for a major
>> release.
>>
>> All that said, there's nothing wrong with doing compatibility & smoke
>> tests
>> against the latest 3.x release as well as 3.0 and reporting bugs back to
>> the Apache Cassandra JIRA, I'm sure it would be greatly appreciated.
>>
>> https://issues.apache.org/jira/secure/Dashboard.jspa
>>
>> Jon
>>
>>
>>
>> On Thu, Oct 13, 2016 at 3:15 PM Eric Ho <e...@analyticsmd.com> wrote:
>>
>> Some suggested Elassandra.  But that is based on Cassandra 2.2.
>> I would like to use Cassandra 3.7 and up...
>>
>>
>>
>> -eric ho
>>
>>
>> On Thu, Oct 13, 2016 at 3:04 PM, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>> Elassandra
>> https://github.com/vroyer/elassandra
>>
>> Le 14 oct. 2016 12:02 AM, "Eric Ho" <e...@analyticsmd.com> a écrit :
>>
>> I don't want to change my code to write into C* and then to ES.
>> So, I'm looking for some sort of a sync tool that will sync my C* table
>> into ES and it should be smart enough to avoid duplicates or gaps.
>> Is there such a tool / plugin ?
>> I'm using stock apache Cassandra 3.7.
>> I know that some premium Cassandra has ES builtin or integrated but I
>> can't afford premium right now...
>> Thanks.
>>
>> -eric ho
>>
>>
>>
>>
>>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Cassandra Upgrade

2016-11-29 Thread kurt Greaves
Why would you remove all the data? That doesn't sound like a good idea.
Just upgrade the OS and then go through the normal upgrade flow of starting
C* with the next version and upgrading sstables.

Also, *you will need to go from 2.0.14 -> 2.1.16 -> 2.2.8* and upgrade
sstables at each stage of the upgrade. you cannot transition from 2.0.14
straight to 2.2.8.​


Re: Which version is stable enough for production environment?

2016-11-30 Thread kurt Greaves
Yes Benjamin, no one said it wouldn't. We're actively backporting things as
we get time, if you find something you'd like backported raise an issue and
let us know. We're well aware of the issues affecting MVs, but they haven't
really been solved anywhere yet.

On 30 November 2016 at 07:54, Benjamin Roth  wrote:

> Hi Brooke,
>
> Just had a quick look on your code and I will promise that your LTS
> version will have the same issues with MVs as any other version.
> For details check CASSANDRA-12905 or CASSANDRA-12888.
>
> 2016-11-30 8:35 GMT+01:00 Brooke Jensen :
>
>> 2.1 will be end of life soon.
>>
>> We have a number of customers running 3.7 in production and it's quite
>> stable. However you should always test in a lower environment first with
>> your data model to be sure.
>>
>> If you're interested, we have made available a patched version of 3.7
>> 
>> which backports some key patches from 3.9.
>> https://github.com/instaclustr/cassandra
>>
>>
>> *Brooke Jensen*
>> VP Technical Operations & Customer Services
>> www.instaclustr.com | support.instaclustr.com
>> 
>>
>> This email has been sent on behalf of Instaclustr Limited (Australia) and
>> Instaclustr Inc (USA). This email and any attachments may contain
>> confidential and legally privileged information.  If you are not the
>> intended recipient, do not copy or disclose its content, but please reply
>> to this email immediately and highlight the error to the sender and then
>> immediately delete the message.
>>
>> On 30 November 2016 at 18:20, Benjamin Roth 
>> wrote:
>>
>>> What are the compaction issues / hint corruprions you encountered? Are
>>> there JIRA tickets for it?
>>> I am curios cause I use 3.10 (trunk) in production.
>>>
>>> For anyone who is planning to use MVs:
>>> They basically work. We use them in production since some months, BUT
>>> (it's a quite big one) maintainance is a pain. Bootstrapping and repairs
>>> may be - depending on the model, config, amount of data - really, really
>>> painful. I'm currently investigating intensively.
>>>
>>> 2016-11-30 3:11 GMT+01:00 Harikrishnan Pillai :
>>>
 3.0 has "off the heap memtable" impl removed and if you have a
 requirement for this,its not available.If you don't have the requirement
 3.0.9 can be tried out. 3.9 version we did some testing and find lot issues
 in compaction,hint corruption etc.

 Regards

 Hari


 --
 *From:* Discovery 
 *Sent:* Tuesday, November 29, 2016 5:59 PM
 *To:* user
 *Subject:* Re: Which version is stable enough for production
 environment?

 Why version 3.x is not recommended?  Thanks.


 -- Original --
 *From: * "Harikrishnan Pillai";;
 *Date: * Wed, Nov 30, 2016 09:57 AM
 *To: * "user";
 *Subject: * Re: Which version is stable enough for production
 environment?

 Cassandra 2.1.16


 --
 *From:* Discovery 
 *Sent:* Tuesday, November 29, 2016 5:42 PM
 *To:* user
 *Subject:* Which version is stable enough for production environment?

 Hi Cassandra Experts,

   We prepare to deploy Cassandra in production env, but
 we can not confirm which version is stable and recommended, could someone
 in this mail list give the suggestion? Thanks in advance!


 Best Regards
 Discovery
 11/30/2016

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Cassandra 2.x Stability

2016-11-30 Thread kurt Greaves
Latest release in 2.2. 2.1 is borderline EOL and from my experience 2.2 is
quite stable and has some handy bugfixes that didn't actually make it into
2.1

On 30 November 2016 at 10:41, Shalom Sagges  wrote:

> Hi Everyone,
>
> I'm about to upgrade our 2.0.14 version to a newer 2.x version.
> At first I thought of upgrading to 2.2.8, but I'm not sure how stable it
> is, as I understand the 2.2 version was supposed to be a sort of beta
> version for 3.0 feature-wise, whereas 3.0 upgrade will mainly handle the
> storage modifications (please correct me if I'm wrong).
>
> So my question is, if I need a 2.x version (can't upgrade to 3 due to
> client considerations), which one should I choose, 2.1.x or 2.2.x? (I'm
> don't require any new features available in 2.2).
>
> Thanks!
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
>  
>  We Create Meaningful Connections
>
> 
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>


Re: Cassandra cluster performance

2017-01-05 Thread kurt Greaves
you should try switching to async writes and then perform the test. sync
writes won't make much difference from a single node but multiple nodes
there should be a massive difference.

On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <
bjano...@cisco.com> wrote:

> Hi,
>
>
>
> Our column family definition is
>
>
>
> *"CREATE TABLE onem2m.cse(" *+
> *"name TEXT PRIMARY KEY," *+
> *"resourceId TEXT," *+
> *")"*;
>
> *"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" *+
> *"cseBaseCseId TEXT," *+
> *"aeId TEXT," *+
> *"resourceId TEXT," *+
> *"PRIMARY KEY ((cseBaseCseId), aeId)" *+
> *")"*;
>
>
>
> *"CREATE TABLE IF NOT EXISTS onem2m.Resources_" *+ i + *"(" *+
> *"CONTENT_INSTANCE_OldestId TEXT," *+
> *"CONTENT_INSTANCE_LatestId TEXT," *+
> *"SUBSCRIPTION_OldestId TEXT," *+
> *"SUBSCRIPTION_LatestId TEXT," *+
> *"resourceId TEXT PRIMARY KEY," *+
> *"resourceType TEXT," *+
> *"resourceName TEXT," *+
> *"jsonContent TEXT," *+
> *"parentId TEXT," *+
> *")"*;
>
> *"CREATE TABLE IF NOT EXISTS onem2m.Children_" *+ i + *"(" *+
> *"parentResourceId TEXT," *+
> *"childName TEXT," *+
> *"childResourceId TEXT," *+
> *"nextId TEXT," *+
> *"prevId TEXT," *+
> *"PRIMARY KEY ((parentResourceId), childName)" *+
> *")"*;
>
>
>
>
>
>
>
> *From: *Abhishek Kumar Maheshwari 
> *Date: *Sunday, December 25, 2016 at 8:54 PM
> *To: *"Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" <
> bjano...@cisco.com>
> *Cc: *"user@cassandra.apache.org" 
> *Subject: *RE: Cassandra cluster performance
>
>
>
> Hi Branislav,
>
>
>
>
>
> What is your column family definition?
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) [mailto:
> bjano...@cisco.com]
> *Sent:* Thursday, December 22, 2016 6:18 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra cluster performance
>
>
>
> Hi,
>
>
>
> - Consistency level is set to ONE
>
> -  Keyspace definition:
>
> *"CREATE KEYSPACE  IF NOT EXISTS  onem2m " *+
> *"WITH replication = " *+
> *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*;
>
>
>
> - yes, the client is on separate VM
>
> - In our project we use Cassandra API version 3.0.2 but the database 
> (cluster) is version 3.9
>
> - for 2node cluster:
>
>  first VM: 25 GB RAM, 16 CPUs
>
>  second VM: 16 GB RAM, 16 CPUs
>
>
>
>
>
>
>
> *From: *Ben Slater 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, December 21, 2016 at 2:32 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Cassandra cluster performance
>
>
>
> You would expect some drop when moving to single multiple nodes but on the
> face of it that feels extreme to me (although I’ve never personally tested
> the difference). Some questions that might help provide an answer:
>
> - what consistency level are you using for the test?
>
> - what is your keyspace definition (replication factor most importantly)?
>
> - where are you running your test client (is it a separate box to
> cassandra)?
>
> - what C* version?
>
> - what are specs (CPU, RAM) of the test servers?
>
>
>
> Cheers
>
> Ben
>
>
>
> On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at
> Cisco)  wrote:
>
> Hi all,
>
>
>
> I’m working on a project and we have Java benchmark test for testing the
> performance when using Cassandra database. Create operation on a single
> node Cassandra cluster is about 15K operations per second. Problem we have
> is when I set up cluster with 2 or more nodes (each of them are on separate
> virtual machines and servers), the performance goes down to 1K ops/sec. I
> follow the official instructions on how to set up a multinode cluster – the
> only things I change in Cassandra.yaml file are: change seeds to IP address
> of one node, change listen and rpc address to IP address of the node and
> finally change endpoint snitch to GossipingPropertyFileSnitch. The
> replication factor is set to 1 when having 2-node cluster. I use only one
> datacenter. The cluster seems to be doing fine (I can see nodes
> communicating) and so is the CPU, RAM usage on the machines.
>
>
>
> Does anybody have any ideas? Any help would be very appreciated.
>
>
>
> Thanks!
>
>
>
> A must visit exhibition for all Fitness and Sports Freaks. TOI Global
> Sports Business 

Re: How to change Replication Strategy and RF

2016-12-29 Thread kurt Greaves
​If you're already using the cluster in production and require no downtime
you should perform a datacenter migration first to change the RF to 3.
Rough process would be as follows:

   1. Change keyspace to NetworkTopologyStrategy with RF=1. You shouldn't
   increase RF here as you will receive read failures as not all nodes have
   the data they own. You would have to wait for a repair to complete to stop
   any read failures.
   2. Configure your clients to use a LOCAL_* consistency and
   DCAwareRoundRobinPolicy for load balancing (with the current DC configured)
   3. Add a new datacenter, configure it's replication to be 3.
   4. Rebuild the new datacenter by running nodetool rebuild  on
   each node in the new DC.
   5. Migrate your clients to use the new datacenter, by switching the
   contact points to nodes in the new DC and the load balancing policy DC to
   the new DC
   6. At this point you could increase the replication factor on the old DC
   to 3, and then run a repair. Once the repair successfully completes you
   should have 2 DCs that you can use. If you need the DCs in separate
   locations you could change this step to adding another DC in the desired
   other location and running rebuilds as per steps 2-4.

- Kurt


Re: Join_ring=false Use Cases

2016-12-20 Thread kurt Greaves
It seems that you're correct in saying that writes don't propagate to a
node that has join_ring set to false, so I'd say this is a flaw. In reality
I can't see many actual use cases in regards to node outages with the
current implementation. The main usage I'd think would be to have
additional coordinators for CPU heavy workloads.

It seems to make it actually useful for repairs/outages we'd need to have
another option to turn on writes so that it behaved similarly to write
survey mode (but on already bootstrapped nodes).

Is there a reason we don't have this already? Or does it exist somewhere
I'm not aware of?

On 20 December 2016 at 17:40, Anuj Wadehra  wrote:

> No responses yet :)
>
> Any C* expert who could help on join_ring use case and the concern raised?
>
> Thanks
> Anuj
>
> On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra
>  wrote:
> Hi,
>
> I need to understand the use case of join_ring=false in case of node
> outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you
> would want join_ring=false when you have to repair a node before bringing a
> node back after some considerable outage. The problem I see with
> join_ring=false is that unlike autobootstrap, the node will NOT accept
> writes while you are running repair on it. If a node was down for 5 hours
> and you bring it back with join_ring=false, repair the node for 7 hours and
> then make it join the ring, it will STILL have missed writes because while
> the time repair was running (7 hrs), writes only went to other others.
> So, if you want to make sure that reads served by the restored node at CL
> ONE will return consistent data after the node has joined, you wont get
> that as writes have been missed while the node is being repaired. And if
> you work with Read/Write CL=QUORUM, even if you bring back the node without
> join_ring=false, you would anyways get the desired consistency. So, how
> join_ring would provide any additional consistency in this case ??
>
> I can see join_ring=false useful only when I am restoring from Snapshot or
> bootstrapping and there are dropped mutations in my cluster which are not
> fixed by hinted handoff.
>
> For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted
> Handoff=3 hrs.
> 10 AM Snapshot taken on all 3 nodes
> 11 AM: Node B goes down for 4 hours
> 3 PM: Node B comes up but data is not repaired. So, 1 hr of dropped
> mutations (2-3 PM) not fixed via Hinted Handoff.
> 5 PM: Node A crashes.
> 6 PM: Node A restored from 10 AM Snapshot, Node A started with
> join_ring=false, repaired and then joined the cluster.
>
> In above restore snapshot example, updates from 2-3 PM were outside hinted
> handoff window of 3 hours. Thus, node B wont get those updates. Node A data
> for 2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e.
> node C and minimum consistency needed is QUORUM so join_ring=false would
> help. But this is very specific use case.
>
> Thanks
> Anuj
>
>


Re: Incremental repair for the first time

2016-12-20 Thread kurt Greaves
No workarounds, your best/only option is to upgrade (plus you get the
benefit of loads of other bug fixes).

On 16 December 2016 at 21:58, Kathiresan S 
wrote:

> Thank you!
>
> Is any work around available for this version?
>
> Thanks,
> Kathir
>
>
> On Friday, December 16, 2016, Jake Luciani  wrote:
>
>> This was fixed post 3.0.4 please upgrade to latest 3.0 release
>>
>> On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S <
>> kathiresanselva...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a brand new Cassandra cluster (version 3.0.4) and we set up
>>> nodetool repair scheduled for every day (without any options for repair).
>>> As per documentation, incremental repair is the default in this case.
>>> Should we do a full repair for the very first time on each node once and
>>> then leave it to do incremental repair afterwards?
>>>
>>> *Problem we are facing:*
>>>
>>> On a random node, the repair process throws validation failed error,
>>> pointing to some other node
>>>
>>> For Eg. Node A, where the repair is run (without any option), throws
>>> below error
>>>
>>> *Validation failed in /Node B*
>>>
>>> In Node B when we check the logs, below exception is seen at the same
>>> exact time...
>>>
>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over
>>> the same sstables*
>>> *at
>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
>>> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>>> *at
>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
>>> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>>> *at
>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
>>> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>>> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> ~[na:1.8.0_73]*
>>> *at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_73]*
>>>
>>> Can you please help on how this can be fixed?
>>>
>>> Thanks,
>>> Kathir
>>>
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>


Re: iostat -like tool to parse 'nodetool cfstats'

2016-12-20 Thread kurt Greaves
Anything in cfstats you should be able to retrieve through the metrics
Mbeans. See https://cassandra.apache.org/doc/latest/operating/metrics.html

On 20 December 2016 at 23:04, Richard L. Burton III 
wrote:

> I haven't seen anything like that myself. It would be nice to have
> nodetool cfstats to be presented in a nicier format.
>
> If you plan to work on that, let me know. I would help contribute to it
> next month.
>
> On Tue, Dec 20, 2016 at 5:59 PM, Kevin Burton  wrote:
>
>> nodetool cfstats has some valuable data but what I would like is a 1
>> minute delta.
>>
>> Similar to iostat...
>>
>> It's easy to parse this but has anyone done it?
>>
>> I want to see IO throughput and load on C* for each table.
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>


Re: Cassandra cluster performance

2016-12-23 Thread kurt Greaves
Branislav, are you doing async writes?


Re: Very odd & inconsistent results from SASI query

2017-03-20 Thread kurt greaves
As secondary indexes are stored individually on each node what you're
suggesting sounds exactly like a consistency issue. the fact that you read
0 cells on one query implies the node that got the query did not have any
data for the row. The reason you would sometimes see different behaviours
is likely because of read repairs. The fact that the repair guides the
issue pretty much guarantees it's a consistency issue.

You should check for dropped mutations in tpstats/logs and if they are
occurring try and stop that from happening (probably load related). You
could also try performing reads and writes at LOCAL_QUORUM for stronger
consistency, however note this has a performance/latency impact.


Re: Internal Security - Authentication & Authorization

2017-03-15 Thread kurt greaves
Jacob, seems you are on the right track however my understanding is that
only the user that was auth'd has their permissions/roles/creds cached.

Also. Cassandra will query at QUORUM for the "cassandra" user, and at
LOCAL_ONE for *all* other users. This is the same for creating users/roles.


Re: changing compaction strategy

2017-03-15 Thread kurt greaves
The rogue pending task is likely a non-issue. If your jmx command went
through without errors and you got the log message you can assume it
worked. It won't show in the schema unless you run the ALTER statement
which affects the whole cluster.

If you were switching from STCS then you wouldn't expect any recompaction,
as SSTables will just be calculated to be in their relevant window. On the
other hand switching from LCS would generate many compactions.
​
The only way to really tell is to confirm SSTables are expiring as you
would expect them to, or verify that new SSTables are not being compacted
with older ones. You might need to wait for at least one time window to
pass to check this.


Re: Change the IP of a live node

2017-03-15 Thread kurt greaves
Cassandra uses the IP address for more or less everything. It's possible to
change it through some hackery however probably not a great idea. The nodes
system tables will still reference the old IP which is likely your problem
here.

On 14 March 2017 at 18:58, George Sigletos  wrote:

> To give a complete picture, my node has actually two network interfaces:
> eth0 for 192.168.xx.xx and eth1 for 10.179.xx.xx
>
> On Tue, Mar 14, 2017 at 7:46 PM, George Sigletos 
> wrote:
>
>> Hello,
>>
>> I am trying to change the IP of a live node (I am not replacing a dead
>> one).
>>
>> So I stop the service on my node (not a seed node), I change the IP from
>> 192.168.xx.xx to 10.179.xx.xx, and modify "listen_address" and
>> "rpc_address" in the cassandra.yaml, while I also set auto_bootstrap:
>> false. Then I restart but it fails to see the rest of the cluster:
>>
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  AddressLoad   Tokens  OwnsHost
>> ID   Rack
>> DN  192.168.xx.xx  ?  256 ?
>> 241f3002-8f89-4433-a521-4fa4b070b704  r1
>> UN  10.179.xx.xx  3.45 TB256 ?
>> 3b07df3b-683b-4e2d-b307-3c48190c8f1c  RAC1
>> DN  192.168.xx.xx  ?  256 ?
>> 19636f1e-9417-4354-8364-6617b8d3d20b  r1
>> DN  192.168.xx.xx?  256 ?
>> 9c65c71c-f5dd-4267-af9e-a20881cf3d48  r1
>> DN  192.168.xx.xx   ?  256 ?
>> ee75219f-0f2c-4be0-bd6d-038315212728  r1
>>
>> Am I doing anything wrong? Thanks in advance
>>
>> Kind regards,
>> George
>>
>
>


Re: Streaming errors during bootstrap

2017-04-20 Thread kurt greaves
Did this error persist? What was the expected outcome? Did you drop this CF
and now expect it to no longer exist?

On 12 April 2017 at 01:26, Jai Bheemsen Rao Dhanwada 
wrote:

> Hello,
>
> I am seeing streaming errors while adding new nodes(in the same DC) to the
> cluster.
>
> ERROR [STREAM-IN-/x.x.x.x] 2017-04-11 23:09:29,318 StreamSession.java:512
> - [Stream #a8d56c70-1f0b-11e7-921e-61bb8bdc19bb] Streaming error occurred
> java.io.IOException: CF *465ed8d0-086c-11e6-9744-2900b5a9ab11* was
> dropped during streaming
> at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(
> CompressedStreamReader.java:77) ~[apache-cassandra-2.1.16.jar:2.1.16]
> at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.
> deserialize(IncomingFileMessage.java:48) ~[apache-cassandra-2.1.16.jar:
> 2.1.16]
> at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.
> deserialize(IncomingFileMessage.java:38) ~[apache-cassandra-2.1.16.jar:
> 2.1.16]
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
> ~[apache-cassandra-2.1.16.jar:2.1.16]
> at org.apache.cassandra.streaming.ConnectionHandler$
> IncomingMessageHandler.run(ConnectionHandler.java:276)
> ~[apache-cassandra-2.1.16.jar:2.1.16]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
>
> The CF : 465ed8d0-086c-11e6-9744-2900b5a9ab11 is actually present and all
> the nodes are in sync. I am sure there is  not n/w connectivity issues.
> Not sure why this error is happening.
>
> I tried to run repair/scrub on the CF with metadata :
> 465ed8d0-086c-11e6-9744-2900b5a9ab11 but didn't help.
>
> Any idea what else to look for in this case?
>
> Thanks in advance.
>
>


Re: Cassandra isn't compacting old files

2017-08-01 Thread kurt greaves
Seeing as there aren't even 100 SSTables in L2, LCS should be gradually
trying to compact L3 with L2. You could search the logs for "Adding
high-level (L3)" to check if this is happening. ​


Re: UndeclaredThrowableException, C* 3.11

2017-08-02 Thread kurt greaves
If the repair command failed, repair also failed. Regarding % repaired, no
it's unlikely you will see 100% repaired after a single repair. Maybe after
a few consecutive repairs with no data load you might get it to 100%.​


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
You can't just add a new DC and then tell their clients to connect to the
new one (after migrating all the data to it obv.)? If you can't achieve
that you should probably use GossipingPropertyFileSnitch.​ Your best plan
is to have the desired RF/redundancy from the start. Changing RF in
production is not fun and can be costly.


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
If you want to change RF on a live system your best bet is through DC
migration (add another DC with the desired # of nodes and RF), and migrate
your clients to use that DC. There is a way to boot a node and not join the
ring, however I don't think it will work for new nodes (have not
confirmed), also increasing RF in this way would only not be completely
catastrophic if you were increasing RF to N (num nodes).​


Re: Data Loss irreparabley so

2017-08-02 Thread kurt greaves
You should run repairs every GC_GRACE_SECONDS. If a node is overloaded/goes
down, you should run repairs. LOCAL_QUORUM will somewhat maintain
consistency within a DC, but certainly doesn't mean you can get away
without running repairs. You need to run repairs even if you are using
QUORUM or ONE.​


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-02 Thread kurt greaves
only in this one case might that work (RF==N)


Re: Is it possible to delete system_auth keyspace.

2017-08-01 Thread kurt greaves
You should be able to create it yourself prior to enabling auth without
issues. alternatively you could just add an extra node with auth on, or
switch one node to have auth on then change th RF.


Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-15 Thread kurt greaves
Haven't done it for 5.1 but went smoothly for earlier versions. If you're
not using any of the additional features of DSE, it should be OK. Just
change any custom replication strategies before migrating and also make
sure your yaml options are compatible.


Re: Attempted to write commit log entry for unrecognized table

2017-08-15 Thread kurt greaves
what does nodetool describecluster show?
stab in the dark but you could try nodetool resetlocalschema  or a rolling
restart of the cluster if it's schema issues.


Re: rebuild constantly fails, 3.11

2017-08-11 Thread kurt greaves
cc'ing user back in...

On 12 Aug. 2017 01:55, "kurt greaves" <k...@instaclustr.com> wrote:

> How much memory do these machines have?  Typically we've found that G1
> isn't worth it until you get to around 24G heaps, and even at that it's not
> really better than CMS. You could try CMS with an 8G heap and 2G new size.
>
> However as the oom is only happening on one node have you ensured there
> are no extra processes running on that node that could be consuming extra
> memory? Note that the oom killer will kill the process with the highest oom
> score, which generally corresponds to the process using the most memory,
> but not necessarily the problem.
>
> Also could you run nodetool info on the problem node and 1 other and dump
> the output in a gist? It would be interesting to see if there is a
> significant difference in off-heap.
>
> On 11 Aug. 2017 17:30, "Micha" <mich...@fantasymail.de> wrote:
>
>> It's an oom issue, the kernel kills the cassandra job.
>> The config was to use offheap buffers and 20G java heap, I changed this
>> to use heap buffers and 16G java heap. I added a  new node yesterday
>> which got streams from 4 other nodes. They all succeeded except on the
>> one node which failed before. This time again the db was killed by the
>> kernel. At the moment I don't know what is the reason here, since the
>> nodes are equal.
>>
>> For me it seems the g1gc is not able to free the memory fast enough.
>> The settings were for  MaxGCPauseMillis=600 and ParallelGCThreads=10
>> ConcGCThreads=10 which maybe are too high since the node has only 8
>> cores..
>> I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned
>> in the comments of jvm.options
>>
>> Since the bootstrap of the fifth node did not complete I will start it
>> again and check if the memory is still decreasing over time.
>>
>>
>>
>>  Michael
>>
>>
>>
>> On 11.08.2017 01:25, Jeff Jirsa wrote:
>> >
>> >
>> > On 2017-08-08 01:00 (-0700), Micha <mich...@fantasymail.de> wrote:
>> >> Hi,
>> >>
>> >> it seems I'm not able to add add 3 node dc to a 3 node dc. After
>> >> starting the rebuild on a new node, nodetool netstats show it will
>> >> receive 1200 files from node-1 and 5000 from node-2. The stream from
>> >> node-1 completes but the stream from node-2 allways fails, after
>> sending
>> >> ca 4000 files.
>> >>
>> >> After restarting the rebuild it again starts to send the 5000 files.
>> >> The whole cluster is connected via one switch only , no firewall
>> >> between, the networks shows no errors.
>> >> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
>> >> the logs show no errors. The size of the data is ca 1TB.
>> >
>> > Is there anything in `dmesg` ?  System logs? Nothing? Is node2 running?
>> Is node3 running?
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>


Re: Dropping down replication factor

2017-08-13 Thread kurt greaves
On 14 Aug. 2017 00:59, "Brian Spindler"  wrote:

Do you think with the setup I've described I'd be ok doing that now to
recover this node?

The node died trying to run the scrub; I've restarted it but I'm not sure
it's going to get past a scrub/repair, this is why I deleted the other
files as a brute force method.  I think I might have to do the same here
and then kick off a repair if I can't just replace it?

is it just opscenter keyspace that has corrupt sstables? if so I wouldn't
worry about repairing too much. If you can get that third node in C to join
I'd say your best bet is to just do that until you have enough nodes in C.
Dropping and increasing RF is pretty risky on a live system.

It sounds to me like you stand a good chance of getting the new nodes in C
to join so I'd pursue that before trying anything more complicated


Doing the repair on the node that had the corrupt data deleted should be
ok?

Yes. as long as you also deleted corrupt SSTables on any other nodes that
had them.


Re: Unbalanced cluster

2017-07-10 Thread kurt greaves
the reason for the default of 256 vnodes is because at that many tokens the
random distribution of tokens is enough to balance out each nodes token
allocation almost evenly. any less and some nodes will get far more
unbalanced, as Avi has shown. In 3.0 there is a new token allocating
algorithm however it requires configuring prior to adding a node and also
only really works well if your RF=# of racks, or you only use 1 rack. have
a look around for the allocate_token_keyspace option for more details.


Re: adding nodes to a cluster and changing rf

2017-07-14 Thread kurt greaves
Increasing RF will result in nodes that previously didn't have a replica of
the data now being responsible for it. This means that a repair is required
after increasing the RF.

Until the repair completes you will suffer from inconsistencies in data.
For example, in a 3 node cluster with RF 2, nodes A, B and C. A and B could
be responsible for the 2 replicas of row x. As soon as you increase the RF
to 3, C will also be responsible for x and thus can also answer queries
requesting x. But until row x is repaired and present on C, a request for x
against C will return no data.

If you are looking to increase RF on a live production system you should
perform a datacenter migration and increase the RF to 3 only on the new
datacenter, switching your clients across after rebuild+repair on the new
DC.

allocate_tokens_for_local_replication_factor is a DSE configuration
property. allocate_tokens_for_keyspace is the equivalent in Apache
Cassandra. This option is not related to RF changes so shouldn't be
necessary, however might not be a bad idea to research its usage if you are
planning on scaling to many nodes in the future.


Re: write time for nulls is not consistent

2017-07-18 Thread kurt greaves
can you try select a, writetime(b) from test.t
I heard of an issue recently where cqlsh reports null incorrectly if you
query a column twice, wondering if it extends to this case with writetime.


Re: Understanding gossip and seeds

2017-07-21 Thread kurt greaves
Haven't checked the code but pretty sure it's because it will always use
the known state stored in the system tables. the seeds in the yaml are
mostly for initial set up, used to discover the rest of the nodes in the
ring.

Once that's done there is little reason to refer to them again, unless
forced.


Re: read/write request counts and write size of each write

2017-07-25 Thread kurt greaves
You will need to use jmx to collect write/read related metrics. not aware
of anything that measures write size, but if there isn't it should be
easily measured on your client.
there are quite a few existing solutions for monitoring Cassandra out
there, you should find some easily with a quick search. I believe the
graphite/guarana set up works well.


Re: performance penalty of add column in CQL3

2017-07-25 Thread kurt greaves
If by "offline" you mean with no reads going to the nodes, then yes that
would be a *potentially *safe time to do it, but it's still not advised.
You should avoid doing any ALTERs on versions of 3 less than 3.0.14 or 3.11
if possible.

Adding/dropping a column does not require a re-write of the data and it is
relatively efficient (it should take seconds, not hours). It's just a
schema change, so just requires gossip to propagate the schema between the
nodes. Note that if you drop a column all the data in that column is not
actually removed from the SSTables until they are compacted. I believe they
are effectively treated as tombstones if you hit the dropped column (not
sure if that's how the metrics record them though).​


Re: 回复: tolerate how many nodes down in the cluster

2017-07-25 Thread kurt greaves
Keep in mind that you shouldn't just enable multiple racks on an existing
cluster (this will lead to massive inconsistencies). The best method is to
migrate to a new DC as Brooke mentioned.​


Re: read/write request counts and write size of each write

2017-07-25 Thread kurt greaves
Looks like you can collect MutationSizeHistogram for each write as well
from the coordinator, in regards to write request size. See the Write
request section under
https://cassandra.apache.org/doc/latest/operating/metrics.html#client-request-metrics


Re: Data Loss irreparabley so

2017-07-25 Thread kurt greaves
Cassandra doesn't do any automatic repairing. It can tell if your data is
inconsistent, however it's really up to you to manage consistency through
repairs and choice of consistency level for queries. If you lose a node,
you have to manually repair the cluster after replacing the node, but
really you should be doing this every GC Grace seconds regardless.


Re: 1 node doing compaction all the time in 6-node cluster (C* 2.2.8)

2017-07-24 Thread kurt greaves
Just to rule out a simple problem, are you using a load balancing policy?


Re: 回复: tolerate how many nodes down in the cluster

2017-07-24 Thread kurt greaves
I've never really understood why Datastax recommends against racks. In
those docs they make it out to be much more difficult than it actually is
to configure and manage racks.

The important thing to keep in mind when using racks is that your # of
racks should be equal to your RF. If you have keyspaces with different RF,
then it's best to have the same # as the RF of your most important
keyspace, but in this scenario you lose some of the benefits of using racks.

As Anuj has described, if you use RF # of racks, you *can* lose up to an
entire rack without losing availability. Note that this entirely depends on
the situation. *When you take a node down, the other nodes in the cluster
require capacity to be able to handle the extra load that node is no longer
handling. *What this means is that if your cluster will require the other
nodes to store hints for that node (equivalent to the amount of writes made
to that node), and also handle its portion of READs. You can only take out
as many nodes from a rack as the capacity of your cluster allows.

I also strongly disagree that using racks makes operations tougher. If
anything, it makes them considerably easier (especially when using vnodes).
The only difficulty is the initial setup of racks, but for all the possible
benefits it's certainly worth it. As well as the fact that you can lose up
to an entire rack (great for AWS AZ's) without affecting availability,
using racks also makes operations on large clusters much smoother. For
example, when upgrading a cluster, you can now do it a rack at a time, or
some portion of a rack at a time. Same for OS upgrades or any other
operation that could happen in your environment. This is important if you
have lots of nodes.  Also it makes coordinating repairs easier, as you now
only need to repair a single rack to ensure you've repaired all the data.
Basically any operation/problem where you need to consider the distribution
of data, racks are going to help you.


Re: 1 node doing compaction all the time in 6-node cluster (C* 2.2.8)

2017-07-24 Thread kurt greaves
Have you checked system logs/dmesg? I'd suspect it's an instance problem
too, maybe you'll see some relevant errors in those logs.

​


Re: 回复: 回复: tolerate how many nodes down in the cluster

2017-07-27 Thread kurt greaves
Note that if you use more racks than RF you lose some of the operational
benefit. e.g: you'll still only be able to take out one rack at a time
(especially if using vnodes), despite the fact that you have more racks
than RF. As Jeff said this may be desirable, but really it comes down to
what your physical failure domains are and how/if you plan to scale.

As Jeff said, as long as you don't start with # racks < RF you should be
fine.


Re: Restore Snapshot

2017-06-28 Thread kurt greaves
Hm, I did recall seeing a ticket for this particular use case, which is
certainly useful, I just didn't think it had been implemented yet. Turns
out it's been in since 2.0.7, so you should be receiving writes with
join_ring=false. If you confirm you aren't receiving writes then we have an
issue. https://issues.apache.org/jira/browse/CASSANDRA-6961


Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves
I'd say that no, a range query probably isn't the best for monitoring, but
it really depends on how important it is that the range you select is
consistent.

>From those traces it does seem that the bulk of the time spent was waiting
for responses from the replicas, which may indicate a network issue, but
it's not conclusive evidence.

For SSTables you could check the SSTables per read of the query, but it's
unnecessary as the traces indicate that's not the issue. Might be worth
trying to debug potential network issues. Might be worth looking into
metrics like CoordinatorReadLatency and CoordinatorScanLatency at the table
level
https://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics
Also if you have any network traffic metrics between nodes would be a good
place to look.

​Other than that I'd look in the logs on each node when you run the trace
and try and identify any errors that could be causing problems.


  1   2   3   4   >