Re: AWS ephemeral instances + backup

2019-12-05 Thread Ben Slater
We have some tooling that does that kind of thing using S3 rather  than
attached EBS but a similar principle. There is a bit of an overview here:
https://www.instaclustr.com/advanced-node-replace/

It's become a pretty core part of our ops toolbox since we introduced it.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 6 Dec 2019 at 08:32, Jeff Jirsa  wrote:

>
> No experience doing it that way personally, but I'm curious: Are you
> backing up in case of ephemeral instance dying, or backing up in case of
> data problems / errors / etc?
>
> On instance dying, you're probably fine with just straight normal
> replacements, not restoring from backup. For the rest, is it cheaper to use
> something like tablesnap and go straight to s3?
>
> On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
>  wrote:
>
>> Does anyone have experience tooling written to support this strategy:
>>
>> Use case: run cassandra on i3 instances on ephemerals but synchronize the
>> sstables and commitlog files to the cheapest EBS volume type (those have
>> bad IOPS but decent enough throughput)
>>
>> On node replace, the startup script for the node, back-copies the
>> sstables and commitlog state from the EBS to the ephemeral.
>>
>> As can be seen:
>> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>>
>> the (presumably) spinning rust tops out at 2375 MB/sec (using
>> multiple EBS volumes presumably) that would incur about a ten minute delay
>> for node replacement for a 1TB node, but I imagine this would only be used
>> on higher IOPS r/w nodes with smaller densities, so 100GB would be about a
>> minute of delay only, already within the timeframes of an AWS node
>> replacement/instance restart.
>>
>>
>>


Re: Aws instance stop and star with ebs

2019-11-05 Thread Ben Slater
The logs between first start and handshaking should give you a clue but my
first guess would be replaying commit logs.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 6 Nov 2019 at 04:36, Rahul Reddy  wrote:

> I can reproduce the issue.
>
> I did drain Cassandra node then stop and started Cassandra instance .
> Cassandra instance comes up but other nodes will be in DN state around 10
> minutes.
>
> I don't see error in the systemlog
>
> DN  xx.xx.xx.59   420.85 MiB  256  48.2% id  2
> UN  xx.xx.xx.30   432.14 MiB  256  50.0% id  0
> UN  xx.xx.xx.79   447.33 MiB  256  51.1% id  4
> DN  xx.xx.xx.144  452.59 MiB  256  51.6% id  1
> DN  xx.xx.xx.19   431.7 MiB  256  50.1% id  5
> UN  xx.xx.xx.6421.79 MiB  256  48.9%
>
> when i do nodetool status 3 nodes still showing down. and i dont see
> errors in system.log
>
> and after 10 mins it shows the other node is up as well.
>
>
> INFO  [HANDSHAKE-/10.72.100.156] 2019-11-05 15:05:09,133
> OutboundTcpConnection.java:561 - Handshaking version with /stopandstarted
> node
> INFO  [RequestResponseStage-7] 2019-11-05 15:16:27,166 Gossiper.java:1019
> - InetAddress /nodewhichitwasshowing down is now UP
>
> what is causing delay for 10mins to be able to say that node is reachable
>
> On Wed, Oct 30, 2019, 8:37 AM Rahul Reddy 
> wrote:
>
>> And also aws ec2 stop and start comes with new instance with same ip and
>> all our file systems are in ebs mounted fine.  Does coming new instance
>> with same ip cause any gossip issues?
>>
>> On Tue, Oct 29, 2019, 6:16 PM Rahul Reddy 
>> wrote:
>>
>>> Thanks Alex. We have 6 nodes in each DC with RF=3  with CL local qourum
>>> . and we stopped and started only one instance at a time . Tough nodetool
>>> status says all nodes UN and system.log says canssandra started and started
>>> listening . Jmx explrter shows instance stayed down longer how do we
>>> determine what caused  the Cassandra unavialbe though log says its stared
>>> and listening ?
>>>
>>> On Tue, Oct 29, 2019, 4:44 PM Oleksandr Shulgin <
>>> oleksandr.shul...@zalando.de> wrote:
>>>
>>>> On Tue, Oct 29, 2019 at 9:34 PM Rahul Reddy 
>>>> wrote:
>>>>
>>>>>
>>>>> We have our infrastructure on aws and we use ebs storage . And aws was
>>>>> retiring on of the node. Since our storage was persistent we did nodetool
>>>>> drain and stopped and start the instance . This caused 500 errors in the
>>>>> service. We have local_quorum and rf=3 why does stopping one instance 
>>>>> cause
>>>>> application to have issues?
>>>>>
>>>>
>>>> Can you still look up what was the underlying error from Cassandra
>>>> driver in the application logs?  Was it request timeout or not enough
>>>> replicas?
>>>>
>>>> For example, if you only had 3 Cassandra nodes, restarting one of them
>>>> reduces your cluster capacity by 33% temporarily.
>>>>
>>>> Cheers,
>>>> --
>>>> Alex
>>>>
>>>>


Re: Cassandra-stress testing

2019-08-21 Thread Ben Slater
Whether 600m rows per hour is good or bad depends on the hardware you are
using (do you have 1 node or 1000? 2 cores each or 16?) and the data you
are writing (is it 10 bytes per row or 100kb?).

In general, I think you will need to supply a lot more context about your
use case and set up to get any useful response from the community.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 21 Aug 2019 at 21:32,  wrote:

>
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> Thanks for feedback.
>
>
>
> Just to elaborate more, I am currently writing 600m rows per hour and
> need to understand if this is about on target or if there are better ways
> to write or perhaps structure the keyspaces and table structures.
>
>
>
> And I can use the Cassandra Stress tool to get potential maximum
> throughput stats. Or use the schema provided(keyspace/table definitions for
> a stress test)
>
>
>
>
>
> Cassandra, being a scale-out database, can load any arbitrary number of
> records per hour.
>
>
>
> The best way to do this is for your given data model, find what your max
> throughput is on a single node by scaling the number of clients until you
> start seeing errors (or hit your latency SLA) then pull back by 15-20%.
> From there, it's a matter of linearly scaling clients and nodes until you
> hit your desired throughput.
>
>
>
> I recommend taking a look at TLP-Stress as it's a bit easier to use and
> understand:
> https://thelastpickle.com/blog/2018/10/31/tlp-stress-intro.html
>
>
>
> Best.
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
>
>
>
>
> On Tue, Aug 20, 2019 at 7:16 AM Surbhi Gupta 
> wrote:
>
> Have you tried ycsa?
>
> It is a tool from yahoo for stress testing nosql databases.
>
>
>
> On Tue, Aug 20, 2019 at 3:34 AM  wrote:
>
> Hi Everyone,
>
>
>
> Anyone before who have bused Cassandra-stress. I want to test if it’s
> possible to load 600 milllions records per hour in Cassandra or
>
> Find a better way to optimize Cassandra for this case.
>
> Any help will be highly appreciated.
>
>
>
> Sent from Mail
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986=DwMFaQ=adz96Xi0w1RHqtPMowiL2g=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk=qz4MqEErkPhY1u6JLqEJUgJmIIjmnMQjptddjTPJE_M=87TbqmPgsIH-JP0fbsUYHhpSQyxeHVdqioQud3BHygc=>
> for Window
>
>
>


Re: Cassandra-stress testing

2019-08-20 Thread Ben Slater
If you’re after some benchmark that someone else has already run to help
estimate sizing, we pretty regularly publish benchmarking on various cloud
provider instances.

For example, see:
https://www.instaclustr.com/announcing-instaclustr-support-for-aws-i3en-instances/
and https://www.instaclustr.com/certified-apache-cassandra/

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 21 Aug 2019 at 02:42, Marc Selwan  wrote:

> Cassandra, being a scale-out database, can load any arbitrary number of
> records per hour.
>
> The best way to do this is for your given data model, find what your max
> throughput is on a single node by scaling the number of clients until you
> start seeing errors (or hit your latency SLA) then pull back by 15-20%.
> From there, it's a matter of linearly scaling clients and nodes until you
> hit your desired throughput.
>
> I recommend taking a look at TLP-Stress as it's a bit easier to use and
> understand:
> https://thelastpickle.com/blog/2018/10/31/tlp-stress-intro.html
>
> Best.
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Tue, Aug 20, 2019 at 7:16 AM Surbhi Gupta 
> wrote:
>
>> Have you tried ycsa?
>> It is a tool from yahoo for stress testing nosql databases.
>>
>> On Tue, Aug 20, 2019 at 3:34 AM  wrote:
>>
>>> Hi Everyone,
>>>
>>>
>>>
>>> Anyone before who have bused Cassandra-stress. I want to test if it’s
>>> possible to load 600 milllions records per hour in Cassandra or
>>>
>>> Find a better way to optimize Cassandra for this case.
>>>
>>> Any help will be highly appreciated.
>>>
>>>
>>>
>>> Sent from Mail
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986=DwMFaQ=adz96Xi0w1RHqtPMowiL2g=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk=qz4MqEErkPhY1u6JLqEJUgJmIIjmnMQjptddjTPJE_M=87TbqmPgsIH-JP0fbsUYHhpSQyxeHVdqioQud3BHygc=>
>>> for Window
>>>
>>


Re: high write latency on a single table

2019-07-22 Thread Ben Slater
Is the size of the data in your “state” column variable? The higher write
latencies at the 95%+ could line up with large volumes of data for
particular rows in that column (the one column not in both tables)?

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Mon, 22 Jul 2019 at 16:46, CPC  wrote:

> Hi guys,
>
> Any idea? I thought it might be a bug but could not find anything related
> on jira.
>
> On Fri, Jul 19, 2019, 12:45 PM CPC  wrote:
>
>> Hi Rajsekhar,
>>
>> Here the details:
>>
>> 1)
>>
>> [cassadm@bipcas00 ~]$ nodetool tablestats tims.MESSAGE_HISTORY
>> Total number of tables: 259
>> 
>> Keyspace : tims
>> Read Count: 208256144
>> Read Latency: 7.655146714749506 ms
>> Write Count: 2218205275
>> Write Latency: 1.7826005103175133 ms
>> Pending Flushes: 0
>> Table: MESSAGE_HISTORY
>> SSTable count: 41
>> Space used (live): 976964101899
>> Space used (total): 976964101899
>> Space used by snapshots (total): 3070598526780
>> Off heap memory used (total): 185828820
>> SSTable Compression Ratio: 0.8219217809913125
>> Number of partitions (estimate): 8175715
>> Memtable cell count: 73124
>> Memtable data size: 26543733
>> Memtable off heap memory used: 27829672
>> Memtable switch count: 1607
>> Local read count: 7871917
>> Local read latency: 1.187 ms
>> Local write count: 172220954
>> Local write latency: 0.021 ms
>> Pending flushes: 0
>> Percent repaired: 0.0
>> Bloom filter false positives: 130
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 10898488
>> Bloom filter off heap memory used: 10898160
>> Index summary off heap memory used: 2480140
>> Compression metadata off heap memory used: 144620848
>> Compacted partition minimum bytes: 36
>> Compacted partition maximum bytes: 557074610
>> Compacted partition mean bytes: 155311
>> Average live cells per slice (last five minutes): 
>> 25.56639344262295
>> Maximum live cells per slice (last five minutes): 5722
>> Average tombstones per slice (last five minutes): 
>> 1.8681948424068768
>> Maximum tombstones per slice (last five minutes): 770
>> Dropped Mutations: 97812
>>
>> 
>> [cassadm@bipcas00 ~]$ nodetool tablestats tims.MESSAGE_HISTORY_STATE
>> Total number of tables: 259
>> 
>> Keyspace : tims
>> Read Count: 208257486
>> Read Latency: 7.655137315414438 ms
>> Write Count: 2218218966
>> Write Latency: 1.7825896304427324 ms
>> Pending Flushes: 0
>> Table: MESSAGE_HISTORY_STATE
>> SSTable count: 5
>> Space used (live): 6403033568
>> Space used (total): 6403033568
>> Space used by snapshots (total): 19086872706
>> Off heap memory used (total): 6727565
>> SSTable Compression Ratio: 0.271857664111622
>> Number of partitions (estimate): 1396462
>> Memtable cell count: 77450
>> Memtable data size: 620776
>> Memtable off heap memory used: 1338914
>> Memtable switch count: 1616
>> Local read count: 988278
>> Local read latency: 0.518 ms
>> Local write count: 109292691
>> Local write latency: 11.353 ms
>> 

Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-04 Thread Ben Slater
In the normal, happy case the replica would be written to the third node at
the time of the write. However, if they third node happened to be down or
very overloaded at the time of the write (your step 3) the write would
still be reported to the client as successful. Even if the 3rd node is up
again before nodes 1 and 2 die, hints may have expired by that time or may
not finish replaying either due to load (which is sort of the scenario you
outlined) or just not enough time. You’re only really guaranteed all three
replicas are there if a repair runs successful between the initial write
and the two nodes dieing (although it’s very likely there will be three
replicas from the start if the cluster is in a healthy state at the time of
the write).

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 3 May 2019 at 23:19, Fred Habash  wrote:

> Thank you all.
>
> So, please, bear with me for a second. I'm trying to figure out how can
> data be totally lost under the above circumstances when nodes die in two
> out of three racks.
>
> You stated 
> "the replica may or many not have made its way to the third node '. Why
> 'may not'?
>
> This is what I came up with ...
>
> 1. Write goes to coordinator in rac1
> 2. Local coordinator submits RF = 3 writes to all racks
> 3. Two nodes in rack 1 and 2 ack the write. Client is happy
> 4. Nodes massacre happens in rack 1 & 2 (infrastructure event)
> 5. Nodes in rack3 witness in increased in load as a result of cluster
> shrinking
> 6. Coordinator in rack1 stores HH for the row for rack3 (either
> coordinator slows down or rack3 node is overloaded).
> 7. Eventually, coordinator in rack1 dies and HH's are lost.
> 8. The row that was once ack'd to the app, is now gone gone.
>
> Plausible?
>
>
> On Thu, May 2, 2019 at 8:23 PM Avinash Mandava 
> wrote:
>
>> Good catch, misread the detail.
>>
>> On Thu, May 2, 2019 at 4:56 PM Ben Slater 
>> wrote:
>>
>>> Reading more carefully, it could actually be either way: quorum requires
>>> that a majority of nodes complete and ack the write but still aims to write
>>> to RF nodes (with the last replicate either written immediately or
>>> eventually via hints or repairs). So, in the scenario outlined the replica
>>> may or many not have made its way to the third node by the time the first
>>> two replicas are lost. If there is a replica on the third node it can be
>>> recovered to the other two nodes by either rebuild (actually replace) or
>>> repair.
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater**Chief Product Officer*
>>>
>>> <https://www.instaclustr.com/platform/>
>>>
>>> <https://www.facebook.com/instaclustr>
>>> <https://twitter.com/instaclustr>
>>> <https://www.linkedin.com/company/instaclustr>
>>>
>>> Read our latest technical blog posts here
>>> <https://www.instaclustr.com/blog/>.
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Fri, 3 May 2019 at 09:33, Avinash Mandava 
>>> wrote:
>>>
>>>> In scenario 2 it's lost, if both nodes die and get replaced entirely
>>>> there's no history anywhere that the write ever happened, as it wouldn't be
>>>> in commitlog, memtable, or sstable in node 3. Surviving that failure
>>>> scenario of two nodes with same data simultaneously failing requires upping
>>>> CL or RF, or spreading across 3 racks, if the situation you're trying to
>>>> avoid is rack failure (which

Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Ben Slater
Reading more carefully, it could actually be either way: quorum requires
that a majority of nodes complete and ack the write but still aims to write
to RF nodes (with the last replicate either written immediately or
eventually via hints or repairs). So, in the scenario outlined the replica
may or many not have made its way to the third node by the time the first
two replicas are lost. If there is a replica on the third node it can be
recovered to the other two nodes by either rebuild (actually replace) or
repair.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 3 May 2019 at 09:33, Avinash Mandava  wrote:

> In scenario 2 it's lost, if both nodes die and get replaced entirely
> there's no history anywhere that the write ever happened, as it wouldn't be
> in commitlog, memtable, or sstable in node 3. Surviving that failure
> scenario of two nodes with same data simultaneously failing requires upping
> CL or RF, or spreading across 3 racks, if the situation you're trying to
> avoid is rack failure (which im guessing it is from the question setup)
>
> On Thu, May 2, 2019 at 2:25 PM Ben Slater 
> wrote:
>
>> In scenario 2, if the row has been written to node 3 it will be replaced
>> on the other nodes via rebuild or repair.
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> <https://www.instaclustr.com/platform/>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:
>>
>>> C*: 2.2.8
>>>
>>> Write CL = LQ
>>>
>>> Kspace RF = 3
>>>
>>> Three racks
>>>
>>>
>>>
>>> A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1)
>>> & node 2 (rack2)  acknowledge it to the client.
>>>
>>>
>>>
>>> Within some unit of time, node 1 & 2 die. Either ….
>>>
>>>- Scenario 1: C* process death: Row did not make it to sstable (it
>>>is in commit log & was in memtable)
>>>- Scenario 2: Node death: row may be have made to sstable, but nodes
>>>are gone (will have to bootstrap to replace).
>>>
>>>
>>>
>>> Scenario 1: Row is not lost because once C* is restarted, commit log
>>> should replay the mutation.
>>>
>>>
>>>
>>> Scenario 2: row is gone forever? If these two nodes are replaced via
>>> bootstrapping, will they ever get the row back from node 3 (rack3) if the
>>> write ever made it there?
>>>
>>>
>>>
>>>
>>>
>>> 
>>> Thank you
>>>
>>>
>>>
>>
>
> --
> www.vorstella.com
> 408 691 8402
>


Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Ben Slater
In scenario 2, if the row has been written to node 3 it will be replaced on
the other nodes via rebuild or repair.

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:

> C*: 2.2.8
>
> Write CL = LQ
>
> Kspace RF = 3
>
> Three racks
>
>
>
> A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1) &
> node 2 (rack2)  acknowledge it to the client.
>
>
>
> Within some unit of time, node 1 & 2 die. Either ….
>
>- Scenario 1: C* process death: Row did not make it to sstable (it is
>in commit log & was in memtable)
>- Scenario 2: Node death: row may be have made to sstable, but nodes
>are gone (will have to bootstrap to replace).
>
>
>
> Scenario 1: Row is not lost because once C* is restarted, commit log
> should replay the mutation.
>
>
>
> Scenario 2: row is gone forever? If these two nodes are replaced via
> bootstrapping, will they ever get the row back from node 3 (rack3) if the
> write ever made it there?
>
>
>
>
>
> 
> Thank you
>
>
>


Re: different query result after a rerun of the same query

2019-04-30 Thread Ben Slater
If you have succesfully run a repair between the initial insert and running
the first select then that should have ensured that all replicas are there.
Are you sure your repairs are completing successfully?

To check if all replicas are not been written during the periods of high
load you can monitor the dropped mutations metrics.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 30 Apr 2019 at 17:06, Marco Gasparini <
marco.gaspar...@competitoor.com> wrote:

> > My guess is the initial query was causing a read repair so, on
> subsequent queries, there were replicas of the data on every node and it
> still got returned at consistency one
> got it
>
> >There are a number of ways the data could have become inconsistent in the
> first place - eg  badly overloaded or down nodes, changes in topology
> without following proper procedure, etc
> I actually perform repair every day (because I have  lot of deletes).
> The topology has not been changed since months.
> I usually don't have down nodes but I have a high workload every night
> that last for about 2/3 hours. I'm monitoring Cassandra performances via
> prometheus+grafana and I noticed that reads are too slow, about 10/15
> seconds latency, writes are faster than reads, about 600/700 us. I'm using
> non-SSD drives on nodes.
>
>
>
>
>
>
>
>
>
> Il giorno lun 29 apr 2019 alle ore 22:36 Ben Slater <
> ben.sla...@instaclustr.com> ha scritto:
>
>> My guess is the initial query was causing a read repair so, on subsequent
>> queries, there were replicas of the data on every node and it still got
>> returned at consistency one.
>>
>> There are a number of ways the data could have become inconsistent in the
>> first place - eg  badly overloaded or down nodes, changes in topology
>> without following proper procedure, etc.
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> <https://www.instaclustr.com/platform/>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Mon, 29 Apr 2019 at 19:50, Marco Gasparini <
>> marco.gaspar...@competitoor.com> wrote:
>>
>>> thank you Ben for the reply.
>>>
>>> > You haven’t said what consistency level you are using. CQLSH by
>>> default uses consistency level one which may be part of the issue - try
>>> using a higher level (eg CONSISTENCY QUOROM)
>>> yes, actually I used CQLSH so the consistency level was set to ONE.
>>> After I changed it I get the right results.
>>>
>>> >After results are returned correctly are they then returned correctly
>>> for all future runs?
>>> yes it seems that after they returned I can get access to them at each
>>> run of the same query on each node i run it.
>>>
>>> > When was the data inserted (relative to your attempt to query it)?
>>> about a day before the query
>>>
>>>
>>> Thanks
>>>
>>>
>>> Il giorno lun 29 apr 2019 alle ore 10:29 Ben Slater <
>>> ben.sla...@instaclustr.com> ha scritto:
>>>
>>>> You haven’t said what consistency level you are using. CQLSH by default
>>>> uses consistency level one which may be part of the issue - try using a
>>>> higher level (eg CONSISTENCY QUOROM).
>>>>
>&

Re: different query result after a rerun of the same query

2019-04-29 Thread Ben Slater
My guess is the initial query was causing a read repair so, on subsequent
queries, there were replicas of the data on every node and it still got
returned at consistency one.

There are a number of ways the data could have become inconsistent in the
first place - eg  badly overloaded or down nodes, changes in topology
without following proper procedure, etc.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Mon, 29 Apr 2019 at 19:50, Marco Gasparini <
marco.gaspar...@competitoor.com> wrote:

> thank you Ben for the reply.
>
> > You haven’t said what consistency level you are using. CQLSH by default
> uses consistency level one which may be part of the issue - try using a
> higher level (eg CONSISTENCY QUOROM)
> yes, actually I used CQLSH so the consistency level was set to ONE. After
> I changed it I get the right results.
>
> >After results are returned correctly are they then returned correctly for
> all future runs?
> yes it seems that after they returned I can get access to them at each run
> of the same query on each node i run it.
>
> > When was the data inserted (relative to your attempt to query it)?
> about a day before the query
>
>
> Thanks
>
>
> Il giorno lun 29 apr 2019 alle ore 10:29 Ben Slater <
> ben.sla...@instaclustr.com> ha scritto:
>
>> You haven’t said what consistency level you are using. CQLSH by default
>> uses consistency level one which may be part of the issue - try using a
>> higher level (eg CONSISTENCY QUOROM).
>>
>> After results are returned correctly are they then returned correctly for
>> all future runs? When was the data inserted (relative to your attempt to
>> query it)?
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> <https://www.instaclustr.com/platform/>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Mon, 29 Apr 2019 at 17:57, Marco Gasparini <
>> marco.gaspar...@competitoor.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm using Cassandra 3.11.3.5.
>>>
>>> I have just noticed that when I perform a query I get 0 result but if I
>>> launch that same query after few seconds I get the right result.
>>>
>>> I have traced the query:
>>>
>>> cqlsh> select event_datetime, id_url, uuid, num_pages from
>>> mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;
>>>
>>>  event_datetime | id_url | uuid | num_pages
>>> ++--+---
>>>
>>> (0 rows)
>>>
>>> Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69
>>>
>>>  activity
>>>
>>>  | timestamp  | source| source_elapsed
>>> | client
>>>
>>> --++---++---
>>>
>>>
>>>  Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>>> 0 | 10.8.0.10
>>>  Parsing select event_datetime, id_url, uuid, num_pages from
>>> mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
>>> [Native-Transport-Requests-2

Re: different query result after a rerun of the same query

2019-04-29 Thread Ben Slater
You haven’t said what consistency level you are using. CQLSH by default
uses consistency level one which may be part of the issue - try using a
higher level (eg CONSISTENCY QUOROM).

After results are returned correctly are they then returned correctly for
all future runs? When was the data inserted (relative to your attempt to
query it)?

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Mon, 29 Apr 2019 at 17:57, Marco Gasparini <
marco.gaspar...@competitoor.com> wrote:

> Hi all,
>
> I'm using Cassandra 3.11.3.5.
>
> I have just noticed that when I perform a query I get 0 result but if I
> launch that same query after few seconds I get the right result.
>
> I have traced the query:
>
> cqlsh> select event_datetime, id_url, uuid, num_pages from
> mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;
>
>  event_datetime | id_url | uuid | num_pages
> ++--+---
>
> (0 rows)
>
> Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69
>
>  activity
>
>| timestamp  | source| source_elapsed |
> client
>
> --++---++---
>
>
>  Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 0 | 10.8.0.10
>  Parsing select event_datetime, id_url, uuid, num_pages from
> mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 238 | 10.8.0.10
>
>   Preparing statement
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 361 | 10.8.0.10
>
>  reading data from /10.8.0.38
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.531000 | 10.8.0.10 |
> 527 | 10.8.0.10
>
> Sending READ message to /10.8.0.38
> [MessagingService-Outgoing-/10.8.0.38-Small] | 2019-04-29 09:39:05.531000 |
> 10.8.0.10 |620 | 10.8.0.10
>
>READ message received from /10.8.0.10
> [MessagingService-Incoming-/10.8.0.10] | 2019-04-29 09:39:05.535000 |
> 10.8.0.8 | 44 | 10.8.0.10
>
>   speculating read retry on /10.8.0.8
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.535000 | 10.8.0.10 |
>4913 | 10.8.0.10
>
>Executing single-partition query on
> mkp_lookup [ReadStage-2] | 2019-04-29 09:39:05.535000 |  10.8.0.8 |
> 304 | 10.8.0.10
>
>   Sending READ message to /10.8.0.8
> [MessagingService-Outgoing-/10.8.0.8-Small] | 2019-04-29 09:39:05.535000 |
> 10.8.0.10 |   4970 | 10.8.0.10
>
>  Acquiring sstable
> references [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 391 | 10.8.0.10
>
>Bloom filter allows skipping sstable
> 1 [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |490 |
> 10.8.0.10
>
> Skipped 0/1 non-slice-intersecting sstables, included 0 due to
> tombstones [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 549 | 10.8.0.10
>
> Merged data from memtables and 0
> sstables [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
>   697 | 10.8.0.10
>
>Read 0 live rows and 0 tombstone
> cells [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 808 | 10.8.0.10
>
>  Enqueuing response to /
> 10.8.0.10 [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 896 | 10.8.0.10
>
> Sending REQUEST_RESPONSE message to /10.8.0.10
> [MessagingService-Outgoing-/10.8.0.10-Small] | 2019-04-29 09:39:05.536000
> |  10.8.0.8 |   1141 | 10.8.0.10
>
>  REQUEST_RESPONSE message received from /10.8.

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-25 Thread Ben Slater
In the absence of anyone else having any bright ideas - it still sounds to
me like the kind of scenario that can occur in a heavily overloaded
cluster. I would try again with a lower load.

What size machines are you using for stress client and the nodes? Are they
all on separate machines?

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada  wrote:

> Hello,
>
> Sorry again.
> We found yet another weird thing in this.
> If we stop nodes with systemctl or just kill (TERM), it causes the problem,
> but if we kill -9, it doesn't cause the problem.
>
> Thanks,
> Hiro
>
> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada 
> wrote:
>
>> Sorry, I didn't write the version and the configurations.
>> I've tested with C* 3.11.4, and
>> the configurations are mostly set to default except for the replication
>> factor and listen_address for proper networking.
>>
>> Thanks,
>> Hiro
>>
>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
>> wrote:
>>
>>> Hello Ben,
>>>
>>> Thank you for the quick reply.
>>> I haven't tried that case, but it does't recover even if I stopped the
>>> stress.
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
>>> wrote:
>>>
>>>> Is it possible that stress is overloading node 1 so it’s not recovering
>>>> state properly when node 2 comes up? Have you tried running with a lower
>>>> load (say 2 or 3 threads)?
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>> ---
>>>>
>>>>
>>>> *Ben Slater*
>>>> *Chief Product Officer*
>>>>
>>>>
>>>> <https://www.facebook.com/instaclustr>
>>>> <https://twitter.com/instaclustr>
>>>> <https://www.linkedin.com/company/instaclustr>
>>>>
>>>> Read our latest technical blog posts here
>>>> <https://www.instaclustr.com/blog/>.
>>>>
>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>> (Australia) and Instaclustr Inc (USA).
>>>>
>>>> This email and any attachments may contain confidential and legally
>>>> privileged information.  If you are not the intended recipient, do not copy
>>>> or disclose its content, but please reply to this email immediately and
>>>> highlight the error to the sender and then immediately delete the message.
>>>>
>>>>
>>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I faced a weird issue when recovering a cluster after two nodes are
>>>>> stopped.
>>>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>>>> so let me write down the steps to reproduce.
>>>>>
>>>>> === STEPS TO REPRODUCE ===
>>>>> * Create a 3-node cluster with RF=3
>>>>>- node1(seed), node2, node3
>>>>> * Start requests to the cluster with cassandra-stress (it continues
>>>>> until the end)
>>>>>- what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>>>> threads\<=256
>>>>> * Stop node3 normally (with systemctl stop)
>>>>>- the system is still available because the quorum of nodes is
>>>>> still available
>>>>> * Stop node2 normally (with systemctl stop)
>>>>>- the system is NOT available after it's stopped.
>>>>>- the client gets `UnavailableException: Not enough replicas
>>>>> available for query at consistency QUORUM`
>>>>>- the client gets errors right away (so few ms)
>>>>>- so far it's all ex

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Ben Slater
Is it possible that stress is overloading node 1 so it’s not recovering
state properly when node 2 comes up? Have you tried running with a lower
load (say 2 or 3 threads)?

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada  wrote:

> Hello,
>
> I faced a weird issue when recovering a cluster after two nodes are
> stopped.
> It is easily reproduce-able and looks like a bug or an issue to fix,
> so let me write down the steps to reproduce.
>
> === STEPS TO REPRODUCE ===
> * Create a 3-node cluster with RF=3
>- node1(seed), node2, node3
> * Start requests to the cluster with cassandra-stress (it continues
> until the end)
>- what we did: cassandra-stress mixed cl=QUORUM duration=10m
> -errors ignore -node node1,node2,node3 -rate threads\>=16
> threads\<=256
> * Stop node3 normally (with systemctl stop)
>- the system is still available because the quorum of nodes is
> still available
> * Stop node2 normally (with systemctl stop)
>- the system is NOT available after it's stopped.
>- the client gets `UnavailableException: Not enough replicas
> available for query at consistency QUORUM`
>- the client gets errors right away (so few ms)
>- so far it's all expected
> * Wait for 1 mins
> * Bring up node2
>- The issue happens here.
>- the client gets ReadTimeoutException` or WriteTimeoutException
> depending on if the request is read or write even after the node2 is
> up
>- the client gets errors after about 5000ms or 2000ms, which are
> request timeout for write and read request
>- what node1 reports with `nodetool status` and what node2 reports
> are not consistent. (node2 thinks node1 is down)
>- It takes very long time to recover from its state
> === STEPS TO REPRODUCE ===
>
> Is it supposed to happen ?
> If we don't start cassandra-stress, it's all fine.
>
> Some workarounds we found to recover the state are the followings:
> * Restarting node1 and it recovers its state right after it's restarted
> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
> or something)
>
> I don't think either of them is a really good solution.
> Can anyone explain what is going on and what is the best way to make
> it not happen or recover ?
>
> Thanks,
> Hiro
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Ben Slater
Maybe stabledump can help you?
https://cassandra.apache.org/doc/4.0/tools/sstable/sstabledump.html

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 9 Apr 2019 at 19:26, Mahesh Daksha  wrote:

> Thanks Ben for your response.
> WRITETIME  gives the information of about the column value already
> residing int the table. We intend to know  the timestamp of the record
> which is about to apply/update.
> This is needed to understand the timestamp difference of the data residing
> in table with the one going to overwite the same.
>
> This all information is needed as out update statements going silent (not
> reflecting any changes) in database. Not even returning any error or
> exception.
>
> Thanks,
> Mahesh Daksha
>
> On Tue, Apr 9, 2019 at 2:46 PM Ben Slater 
> wrote:
>
>> Not in the logs but I think you should be able to use the WRITETIME
>> function to view via CQL (see
>> https://cassandra.apache.org/doc/latest/cql/dml.html#select)
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:
>>
>>> Hello,
>>>
>>> I have configured the timestamp generator at cassandra client as below:
>>>
>>> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>>>
>>>
>>> My cassandra client inserting and updating few of the rows in a table.
>>> My query is where in the cassandra debug logs I can see the query write
>>> time associated by with updated columns in the update query (sent by
>>> cient). Or if there is any other way I can log the same at client
>>> itself.
>>>
>>> Basically I want to see the write time sent by client to cassandra
>>> cluster.
>>>
>>> Thanks,
>>> Mahesh Daksha
>>>
>>


Re: ***UNCHECKED*** Query regarding cassandra column write time set by client Timestamp Generator

2019-04-09 Thread Ben Slater
Not in the logs but I think you should be able to use the WRITETIME
function to view via CQL (see
https://cassandra.apache.org/doc/latest/cql/dml.html#select)

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 9 Apr 2019 at 16:51, Mahesh Daksha  wrote:

> Hello,
>
> I have configured the timestamp generator at cassandra client as below:
>
> cluster.setTimestampGenerator(new AtomicMonotonicTimestampGenerator());
>
> My cassandra client inserting and updating few of the rows in a table.
> My query is where in the cassandra debug logs I can see the query write
> time associated by with updated columns in the update query (sent by
> cient). Or if there is any other way I can log the same at client itself.
>
> Basically I want to see the write time sent by client to cassandra cluster.
>
> Thanks,
> Mahesh Daksha
>


Re: How to read the Index.db file

2019-02-07 Thread Ben Slater
They don’t do exactly what you want but depending on why you are trying to
get this info you might find our sstable-tools useful:
https://github.com/instaclustr/cassandra-sstable-tools

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 8 Feb 2019 at 08:14, Kenneth Brotman 
wrote:

> When you say you’re trying to get all the partition of a particular
> SSTable, I’m not sure what you mean.  Do you want to make a copy of it?  I
> don’t understand.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2...@gmail.com]
> *Sent:* Wednesday, February 06, 2019 7:51 PM
> *To:* user@cassandra.apache.org
> *Subject:* How to read the Index.db file
>
>
>
> I was trying to get all the partition of a particular SSTable, i have
> tried reading Index,db file  i can read some part of it but not all of it ,
> is there any way to convert it to readable format?
>
>
>
>
>
> Thanks
>
> Pranay
>


Re: Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Ben Slater
Yep, cassandra-stress doesn’t attempt to use the cqlshrc file. Seems to me
it could be convenient so might make a nice contribution to the project.

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 9 Jan 2019 at 11:01, Arvinder Dhillon  wrote:

> Yes, my cluster is set up to authenticate using
> PasswordAuthentication(host, user and password are stored in cqlshrc).
> When I try to run Cassandra-stress without providing user & password on
> command line, it through authentication error. I expect Cassandra-stress to
> read cqlshrc file and authenticate.
> However, if I provide user, password on command line, it works perfectly.
> Thanks
>
> -Arvinder
>
> On Tue, Jan 8, 2019, 1:35 PM Ben Slater 
>> Is your cluster set up to require authentication? I’m a bit unclear about
>> whether you’re trying to connect without passing a user name and password
>> at all (which should just work as the default) or if you’re looking for
>> some mechanism other than the command line to pass the user name / password
>> (in which case I don’t think there is one but stress has a hell of a lot of
>> options so I could be wrong).
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Wed, 9 Jan 2019 at 06:01, Arvinder Dhillon 
>> wrote:
>>
>>> I'm trying to connect cassandra-stress 3.11.0 without providing user and
>>> password option on the comman line. It doesn't seems to be using cqlshrc.
>>> Any suggestions please?
>>>
>>> -Arvinder
>>>
>>


Re: Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Ben Slater
Is your cluster set up to require authentication? I’m a bit unclear about
whether you’re trying to connect without passing a user name and password
at all (which should just work as the default) or if you’re looking for
some mechanism other than the command line to pass the user name / password
(in which case I don’t think there is one but stress has a hell of a lot of
options so I could be wrong).

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 9 Jan 2019 at 06:01, Arvinder Dhillon  wrote:

> I'm trying to connect cassandra-stress 3.11.0 without providing user and
> password option on the comman line. It doesn't seems to be using cqlshrc.
> Any suggestions please?
>
> -Arvinder
>


Re: Cassandra single unreachable node causing total cluster outage

2018-11-27 Thread Ben Slater
In what way does the cluster become unstable (ie more specifically what are
the symptoms)? My first thought would be the loss of the node causing the
other nodes to become overloaded but that doesn’t seem to fit with  your
point 2.

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 27 Nov 2018 at 16:32, Agrawal, Pratik 
wrote:

> Hello all,
>
>
>
> *Setup:*
>
>
>
> 18 Cassandra node cluster. Cassandra version 2.2.8
>
> Amazon C3.2x large machines.
>
> Replication factor of 3 (in 3 different AZs).
>
> Read and Write using Quorum.
>
>
>
> *Use case:*
>
>
>
>1. Short lived data with heavy updates (I know we are abusing
>Cassandra here) with gc grace period of 15 minutes (I know it sounds
>ridiculous). Level-tiered compaction strategy.
>2. Timeseries data, no updates (short lived) (1 hr). TTLed out using
>Date-tiered compaction strategy.
>3. Timeseries data, no updates (long lived) (7 days). TTLed out using
>Date-tiered compaction strategy.
>
>
>
> Overall high read and write throughput (10/second)
>
>
>
> *Problem:*
>
>1. The EC2 machine becomes unreachable (we reproduced the issue by
>taking down network card) and the entire cluster becomes unstable for the
>time until the down node is removed from the cluster. The node is shown as
>DN node while doing nodetool status. Our understanding was that a single
>node down in one AZ should not impact other nodes. We are unable to
>understand why a single node going down is causing entire cluster to become
>unstable. Is there any open bug around this?
>2. We tried another experiment by killing Cassandra process but in
>this case we only see a blip in latencies but all the other nodes are still
>healthy and responsive (as expected).
>
>
>
> Any thoughts/comments on what could be the issue here?
>
>
>
> Thanks,
> Pratik
>
>
>
>
>
>
>


Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Ben Slater
I wrote a blog post a while ago on the pros and cons of encrypting in your
application for use with Cassandra that you might find useful background on
this subject:
https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/

Cheers
Ben

On Wed, 14 Nov 2018 at 13:47 Durity, Sean R 
wrote:

> I think you are asking about **encryption** at rest. To my knowledge,
> open source Cassandra does not support this natively. There are options,
> like encrypting the data in the application before it gets to Cassandra.
> Some companies offer other solutions. IMO, if you need the increased
> security, it is worth using something like DataStax Enterprise.
>
>
>
>
>
> Sean Durity
>
> *From:* Goutham reddy 
> *Sent:* Tuesday, November 13, 2018 1:22 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Is Apache Cassandra supports Data at rest
>
>
>
> Hi,
>
> Does Apache Cassandra supports data at rest, because datastax Cassandra
> supports it. Can anybody help me.
>
>
>
> Thanks and Regards,
>
> Goutham.
>
> --
>
> Regards
>
> Goutham Reddy
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Multiple cluster for a single application

2018-11-07 Thread Ben Slater
I tend to recommend an approach similar to Eric’s functional sharding
although I describe it at quality of service sharding - group your small,
hot data into one cluster and your large, cooler data into another so you
can provision infrastructure and tune according. I guess it depends on you
management environment but if you app functionality allows your to split
into multiple clusters (ie all your data is not all in one giant table)
then I would generally look to split. Splitting also gives you the
advantage of making it harder to have an outage that brings everything down.

Cheers
Ben

On Thu, 8 Nov 2018 at 08:44 Jonathan Haddad  wrote:

> Interesting approach Eric, thanks for sharing that.
>
> Regarding this:
>
> > I've read documents recommended to use clusters with less than 50 or 100
> nodes (Netflix got hundreds of clusters with less 100 nodes on each).
>
> Not sure where you read that, but it's nonsense.  We work with quite a few
> clusters that are several hundred nodes each.  Your problems can get a bit
> amplified, for instance dynamic snitch can make a cluster perform
> significantly worse than if you just flat out disable it, which is what I
> usually recommend.
>
> I'm curious how you arrived at the estimate of needing > 100 nodes.  Is
> that due to space constraints or performance ones?
>
>
>
> On Wed, Nov 7, 2018 at 12:52 PM Eric Stevens  wrote:
>
>> We are engaging in both strategies at the same time:
>>
>> 1) We call it functional sharding - we write to clusters targeted
>> according to the type of data being written.  Because different data types
>> often have different workloads this has the nice side effect of being able
>> to tune each cluster according to its workload.  Your ability to grow in
>> this dimension is limited by the number of business object types you're
>> recording.
>>
>> 2) We write to clusters sharded by time.  Our objects are network
>> security events, so there's always an element of time.  We encode that time
>> into deterministic object IDs so that we are able to identify in the read
>> path which shard to direct the request to by extracting the time
>> component.  This basic idea should be able to work any time you're able to
>> use surrogate keys instead of natural keys.  If you are using natural keys,
>> you may be facing an unpleasant migration should you need to increase the
>> number of shards in this dimension.
>>
>> Our reason for engaging in the second strategy was not purely Cassandra's
>> fault, rather we were using DSE with a search workload, and the cost of
>> rebuilding Solr indexes on streaming operations (such as adding nodes to an
>> existing cluster) required enough resources that we found it prohibitive.
>> That's because the bootstrapping node was also taking a production write
>> workload, and we didn't want to run our cluster with enough overhead that a
>> node could bootstrap and take production workload at the same time.
>>
>> For vanilla Cassandra workloads we have run clusters with quite a bit
>> more nodes than 100 without any appreciable trouble.  Curious if you can
>> share documents about clusters over 100 nodes causing troubles for users.
>> I'm wondering if it's related to node failure rate combined with vnodes
>> meaning that several concurrent node failures cause a part of the ring to
>> go offline too reliably.
>>
>> On Mon, Nov 5, 2018 at 7:38 AM onmstester onmstester
>>  wrote:
>>
>>> Hi,
>>>
>>> One of my applications requires to create a cluster with more than 100
>>> nodes, I've read documents recommended to use clusters with less than 50 or
>>> 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each).
>>> Is it a good idea to use multiple clusters for a single application,
>>> just to decrease maintenance problems and system complexity/performance?
>>> If So, which one of below policies is more suitable to distribute data
>>> among clusters and Why?
>>> 1. each cluster' would be responsible for a specific partial set of
>>> tables only (table sizes are almost equal so easy calculations here) for
>>> example inserts to table X would go to cluster Y
>>> 2. shard data at loader level by some business logic grouping of data,
>>> for example all rows with some column starting with X would go to cluster Y
>>>
>>> I would appreciate sharing your experiences working with big clusters,
>>> problem encountered and solutions.
>>>
>>> Thanks in Advance
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>>
&g

Re: [ANNOUNCE] StratIO's Lucene plugin fork

2018-10-30 Thread Ben Slater
For anyone who is interested, we’ve published a blog with some more
background on this and some more detail of our ongoing plans:
https://www.instaclustr.com/instaclustr-support-cassandra-lucene-index/

Cheers
Ben

On Fri, 19 Oct 2018 at 09:42 kurt greaves  wrote:

> Hi all,
>
> We've had confirmation from Stratio that they are no longer maintaining
> their Lucene plugin for Apache Cassandra. We've thus decided to fork the
> plugin to continue maintaining it. At this stage we won't be making any
> additions to the plugin in the short term unless absolutely necessary, and
> as 4.0 nears we'll begin making it compatible with the new major release.
> We plan on taking the existing PR's and issues from the Stratio repository
> and getting them merged/resolved, however this likely won't happen until
> early next year. Having said that, we welcome all contributions and will
> dedicate time to reviewing bugs in the current versions if people lodge
> them and can help.
>
> I'll note that this is new ground for us, we don't have much existing
> knowledge of the plugin but are determined to learn. If anyone out there
> has established knowledge about the plugin we'd be grateful for any
> assistance!
>
> You can find our fork here:
> https://github.com/instaclustr/cassandra-lucene-index
> At the moment, the only difference is that there is a 3.11.3 branch which
> just has some minor changes to dependencies to better support 3.11.3.
>
> Cheers,
> Kurt
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: cold vs hot data

2018-09-13 Thread Ben Slater
Not quite a solution but you will probably be interested in the discussion
on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8460

On Fri, 14 Sep 2018 at 10:46 Alaa Zubaidi (PDF) 
wrote:

> Hi,
>
> We are using Apache Cassandra 3.11.2 on RedHat 7
> The data can grow to +100TB however the hot data will be in most cases
> less than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?
>
> Regards,
>
> --
>
> Alaa
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.departm...@pdf.com* *.*

-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Secure data

2018-08-01 Thread Ben Slater
My recommendation is generally to look at encrypting in your application as
it’s likely to be overall more secure than DB-level encryption anyway
(generally the closer to the user you encrypt the better). I wrote a blog
on this last year:
https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/

We also use encrypted GP2 EBS pretty widely without issue.

Cheers
Ben

On Thu, 2 Aug 2018 at 05:38 Jonathan Haddad  wrote:

> You can also get full disk encryption with LUKS, which I've used before.
>
> On Wed, Aug 1, 2018 at 12:36 PM Jeff Jirsa  wrote:
>
>> EBS encryption worked well on gp2 volumes (never tried it on any others)
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 1, 2018, at 7:57 AM, Rahul Reddy  wrote:
>>
>> Hello,
>>
>> Any one tried aws ec2 volume encryption for Cassandra instances?
>>
>> On Tue, Jul 31, 2018, 12:25 PM Rahul Reddy 
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to find a good document on to enable encryption for Apache
>>> Cassandra  (not on dse) tables and commilogs and store the keystore in kms
>>> or vault. If any of you already configured please direct me to
>>> documentation for it.
>>>
>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Ben Slater
We (Instaclustr) will be submitting a PR for 3.11.3 support for
cassandra-lucene-index once 3.11.3 is officially released as we offer it as
part of our service and have customers using it.

Cheers
Ben

On Wed, 1 Aug 2018 at 14:06 onmstester onmstester 
wrote:

> It seems to be an interesting project but sort of abandoned. No update in
> last 8 Months and not supporting Cassandra 3.11.2  (the version i currently
> use)
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  Forwarded message 
> From : Andrzej Śliwiński 
> To : 
> Date : Wed, 01 Aug 2018 08:16:06 +0430
> Subject : Re: [EXTERNAL] full text search on some text columns
>  Forwarded message 
>
> Maybe this plugin could do the job:
> https://github.com/Stratio/cassandra-lucene-index
>
> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester 
> wrote:
>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Timeout for only one keyspace in cluster

2018-07-21 Thread Ben Slater
Note that that writetimeout exception can be C*s way of telling you when
there is contention on a LWT (rather than actually timing out). See
https://issues.apache.org/jira/browse/CASSANDRA-9328

Cheers
Ben

On Sun, 22 Jul 2018 at 11:20 Goutham reddy 
wrote:

> Hi,
> As it is a single partition key, try to update the key with only partition
> key instead of passing other columns. And try to set consistency level ONE.
>
> Cheers,
> Goutham.
>
> On Fri, Jul 20, 2018 at 6:57 AM learner dba 
> wrote:
>
>> Anybody has any ideas about this? This is happening in production and we
>> really need to fix it.
>>
>> On Thursday, July 19, 2018, 10:41:59 AM CDT, learner dba
>>  wrote:
>>
>>
>> Our foreignid is unique idetifier and we did check for wide partitions;
>> cfhistorgrams show all partitions are evenly sized:
>>
>> Percentile  SSTables Write Latency  Read LatencyPartition
>> SizeCell Count
>>
>>   (micros)  (micros)   (bytes)
>>
>>
>> 50% 0.00 29.52  0.00
>> 191612
>>
>> 75% 0.00 42.51  0.00
>> 229912
>>
>> 95% 0.00 61.21  0.00
>> 275914
>>
>> 98% 0.00 73.46  0.00
>> 275917
>>
>> 99% 0.00 88.15  0.00
>> 275917
>>
>> Min 0.00  9.89  0.00   150
>> 2
>>
>> Max 0.00 88.15  0.00   7007506
>> 42510
>> any thing else that we can check?
>>
>> On Wednesday, July 18, 2018, 10:44:29 PM CDT, wxn...@zjqunshuo.com <
>> wxn...@zjqunshuo.com> wrote:
>>
>>
>> Your partition key is foreignid. You may have a large partition. Why not
>> use foreignid+timebucket as partition key?
>>
>>
>> *From:* learner dba 
>> *Date:* 2018-07-19 01:48
>> *To:* User cassandra.apache.org 
>> *Subject:* Timeout for only one keyspace in cluster
>> Hi,
>>
>> We have a cluster with multiple keyspaces. All queries are performing
>> good but write operation on few tables in one specific keyspace gets write
>> timeout. Table has counter column and counter update query times out
>> always. Any idea?
>>
>> CREATE TABLE x.y (
>>
>> foreignid uuid,
>>
>> timebucket text,
>>
>> key text,
>>
>> timevalue int,
>>
>> value counter,
>>
>> PRIMARY KEY (foreignid, timebucket, key, timevalue)
>>
>> ) WITH CLUSTERING ORDER BY (timebucket ASC, key ASC, timevalue ASC)
>>
>> AND bloom_filter_fp_chance = 0.01
>>
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>> AND comment = ''
>>
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>> AND crc_check_chance = 1.0
>>
>> AND dclocal_read_repair_chance = 0.1
>>
>> AND default_time_to_live = 0
>>
>> AND gc_grace_seconds = 864000
>>
>>     AND max_index_interval = 2048
>>
>> AND memtable_flush_period_in_ms = 0
>>
>> AND min_index_interval = 128
>>
>> AND read_repair_chance = 0.0
>>
>> AND speculative_retry = '99PERCENTILE';
>>
>> Query and Error:
>>
>> UPDATE x.y SET value = value + 1 where foreignid = ? AND timebucket = ? AND 
>> key = ? AND timevalue = ?, err = {s:\"gocql: no response 
>> received from cassandra within timeout period
>>
>>
>> I verified CL=local_serial
>>
>> We had been working on this issue for many days; any help will be much 
>> appreciated.
>>
>>
>>
>> --
> Regards
> Goutham Reddy
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Certified Cassandra for Enterprise use

2018-05-29 Thread Ben Slater
Hi Pranay

We (Instaclustr) provide enterprise support for Cassandra (
https://www.instaclustr.com/services/cassandra-support/) which may cover
what you are looking for.

Please get in touch direct if you would like to discuss.

Cheers
Ben

On Tue, 29 May 2018 at 10:11 Pranay akula 
wrote:

> Is there any third party who provides security patches/releases for Apache
> cassandra
>
> For Enterprise use is there any third party who provides certified Apache
> cassandra packages ??
>
> Thanks
> Pranay
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Interesting Results - Cassandra Benchmarks over Time Series Data for IoT Use Case I

2018-05-17 Thread Ben Slater
Approach (4) or (5) are what I would go for - they are (as your results
show) basically identical as the composite partition key gets converted
into a single hash.

Looking at your doc, I think the issue is you are using < operators on the
day field. As Cassandra doesn’t natively do range queries on a hash
SparkSQL is being clever and iterating through all the data to find the
partitions that match your selection criteria.

The approach that we have found is necessary to get good performance is to
provide the actual list of days you are interested in which should allow
the conditions to be fully pushed down from Spark to Cassandra (although
this can be a little hard to control with SparkSQL). I gave a talk at
Cassandra summit a couple of years ago on our approach to a very similar
problem. You can find the slides, including some code snippets, here:
https://www.slideshare.net/Instaclustr/instaclustr-webinar-5-transactions-per-second-with-apache-spark-on-apache-cassandra
and
I think the video is still on Youtube. There is also some update
description and code in this blog post:
https://www.instaclustr.com/upgrading-instametrics-to-cassandra-3/. This
one is a bit high level but you might also find relevant:
https://www.instaclustr.com/cassandra-connector-for-spark-5-tips-for-success/

Cheers
Ben

On Thu, 17 May 2018 at 18:06 Arbab Khalil <akha...@an10.io> wrote:

> We have been exploring IoT specific C* schema design over the past few
> months. We wanted to share the benchmarking results with the wider
> community for a) bringing rigor to the discussion, and b) starting a
> discussion for better design.
>
> First the use-case: We have time-series of data from devices on several
> sites, where each device (with a unique dev_id) can have several sensors
> attached to it. Most queries however are both time limited as well as over
> a range of dev_ids, even for a single sensor (Multi-sensor joins are a
> whole different beast for another day!). We want to have a schema where the
> query can complete in time linear to the query ranges for both devices and
> time range, immaterial (largely) to the total data size.
>
>
> So we explored several different primary key definitions, learning from
> the best-practices communicated on this mailing list and over the
> interwebs. While details about the setup (Spark over C*) and schema are in
> a companion blog/site here [1], we just mention the primary keys and the
> key points here.
>
>
>1.
>
>PRIMARY KEY (dev_id, day, rec_time)
>2.
>
>PRIMARY KEY ((dev_id, rec_time)
>3.
>
>PRIMARY KEY (day, dev_id, rec_time)
>4.
>
>PRIMARY KEY ((day, dev_id), rec_time)
>5.
>
>PRIMARY KEY ((dev_id, day), rec_time)
>6.
>
>Combination of above by adding a year field in the schema.
>
>
> The main takeaway (again, please read through the details at [1]) is that
> we really don't have a single schema to answer the use case above without
> some drawback. Thus while the ((day, dev_id), rec_time) gives a constant
> response, it is dependent entirely on the total data size (full scan). On
> the other hand, (dev_id, day, rec_time) and its counterpart (day, dev_id, 
> rec_time)
> provide acceptable results, we have the issue of very large partition space
> in the first, and hotspot while writing for the latter case.
>
> We also observed that having a multi-field partition key allows for fast
> querying only if the "=" is used going left to right. If an IN() (for
> specifying eg. range of time or list of devices) is used once that order,
> than any further usage of IN() removes any benefit (i.e. a near full table
> scan).
> Another useful learning was that using the IN() to query for days is less
> useful than putting in a range query.
>
> Currently, it seems we are in a bind --- should we use a different data
> store for our usecase (which seems quite typical for IoT)? Something like
> HDFS or Parquet? We would love to get feedback on the benchmarking results
> and how we can possibly improve this and share widely.
> [1] Cassandra Benchmarks over Time Series Data for IoT Use Case
> <https://sites.google.com/an10.io/timeseries-results>
>https://sites.google.com/an10.io/timeseries-results
>
>
> --
> Regards,
> Arbab Khalil
> Software Design Engineer
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Basic Copy vs Snapshot for backup

2018-05-10 Thread Ben Slater
The snapshot gives you a complete set of your sstables at a point in time.
If you were copying sstables directly from a live node you would have to
deal with files coming and going due to compactions.

Cheers
Ben

On Thu, 10 May 2018 at 16:45 <vishal1.sha...@ril.com> wrote:

> Dear Community,
>
>
>
> Is there any benefit of taking backup of a node via ‘nodetool snapshot’ vs
> simply copying the data directory other than the fact that snapshot will
> first flush the memTable and then take the backup.
>
>
>
> Thanks and regards,
>
> Vishal Sharma
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Does Cassandra supports ACID txn

2018-04-25 Thread Ben Slater
Would be interested to hear if anyone else has any different approaches but
my approaches would be:
1) Ask if it’s really needed - in the example you gave would it really
matter that, for a small period of time, the hotel appeared in once kind of
search but not another? (Although clearly there are examples where it might
matter.)
2) Put the state that matters in a single table. In this example, have a
hotel_enabled table. Search would have to both find the hotel in one of
your hotel_by_* tables and  then look up the hotel in hotel_enabled to
check it is really enabled. “deleting” a hotel is then a single write to
hotel_enabled. hotel_enabled could also be something like hotel_details so
the other tables really are just indexes. You need to do more reads but
whatever you do consistency doesn’t come for free.

Cheers
Ben


On Thu, 26 Apr 2018 at 12:44 Rajesh Kishore <rajesh10si...@gmail.com> wrote:

> Correction from previous query
>
>
> Thanks Ben and all experts.
>
> I am almost a newbie to NoSQL world and thus I have a very general
> question how does consumer application of Cassandra/other NoSQL
> technologies deal with atomicity & other factors when there is need to 
> *de-normalize
> *data. For example:
>
> Let us say I have requirement for queries
> - find all hotels by name
> - Find all hotels by Point of Interest (POI)
> - Find POI near by a hotel
>
> For these queries I would end up more or less in following tables
> hotels_by_name(hotel_name,hotel_id,city,) primary key - hotel_name
> hotels_by_poi(poi_name,poi_id,hotel_id,hotel_name,..) primary key -
> poi_name
> poi_by_hotel(hotel_id,poi_name,poi_id,poi_loc,hotel_name,..) primary
> key - hotel_id
>
> So, If I have to add/remove a hotel from/into hotels_by_name , I may need
> to add/remove into/from tables hotels_by_poi/poi_by_hotel. So, here my
> assumption is these operations would need to be atomic( and may be
> supporting other ACID properties) . How these kind of operations/usecases
> being handled in Cassandra/NoSQL world?
>
> Appreciate your response.
>
> Thanks,
> Rajesh
>
> On Thu, Apr 26, 2018 at 8:05 AM, Rajesh Kishore <rajesh10si...@gmail.com>
> wrote:
>
>> Thanks Ben and all experts.
>>
>> I am almost a newbie to NoSQL world and thus I have a very general
>> question how does consumer application of Cassandra/other NoSQL
>> technologies deal with atomicity & other factors when there is need to
>> normalize data. For example:
>>
>> Let us say I have requirement for queries
>> - find all hotels by name
>> - Find all hotels by Point of Interest (POI)
>> - Find POI near by a hotel
>>
>> For these queries I would end up more or less in following tables
>> hotels_by_name(hotel_name,hotel_id,city,) primary key - hotel_name
>> hotels_by_poi(poi_name,poi_id,hotel_id,hotel_name,..) primary key -
>> poi_name
>> poi_by_hotel(hotel_id,poi_name,poi_id,poi_loc,hotel_name,..) primary
>> key - hotel_id
>>
>> So, If I have to add/remove a hotel from/into hotels_by_name , I may need
>> to add/remove into/from tables hotels_by_poi/poi_by_hotel. So, here my
>> assumption is these operations would need to be atomic( and may be
>> supporting other ACID properties) . How these kind of operations/usecases
>> being handled in Cassandra/NoSQL world?
>>
>> Appreciate your response.
>>
>> Thanks,
>> Rajesh
>>
>>
>>
>> On Fri, Apr 20, 2018 at 11:07 AM, Ben Slater <ben.sla...@instaclustr.com>
>> wrote:
>>
>>> The second SO answer just says the partitions will be collocated (ie on
>>> the same server) not that the two tables will use the same partition. In
>>> any event, Cassandra does not have the kind of functionality you are
>>> looking for. The closest is logged batch but as Sylvain said, "all that
>>> guarantees is that if some operations of a batch are applied, then all
>>> of them will
>>> *eventually* get applied” and “batch have no rollback whatsoever”.
>>>
>>> As Cassandra won’t help you here, a potential (although admittedly more
>>> complex) option is to do implement compensating transactions at the
>>> application level (eg in the catch block delete the records that were
>>> inserted). That, however, does not provide you the isolation part of ACID.
>>>
>>> You also tend to find that if you have properly denormalised your data
>>> model for Cassandra there is less requirement for these type of batched
>>> updates.
>>>
>>> Cheers
>>> Ben
>>>
>>> On Fri, 20 Apr 2018 at 15:21 Rajesh Kishore &l

Re: read repair with consistency one

2018-04-21 Thread Ben Slater
I haven't checked the code to make sure this is still the case but last
time I checked:
- For any read, if an inconsistency between replicas is detected then this
inconsistency will be repaired. This obviously wouldn’t apply with CL=ONE
because you’re not reading multiple replicas to find inconsistencies.
- If read_repair_chance or dc_local_read_repair_chance are >0 then extra
replicas are checked as part of the query for the % of queries specified by
the chance setting. Again, if inconsistencies are found, they are repaired.
I expect this mechanism would still apply for CL=ONE.


Cheers
Ben

On Sat, 21 Apr 2018 at 22:20 Grzegorz Pietrusza <gpietru...@gmail.com>
wrote:

> I haven't asked about "regular" repairs. I just wanted to know how read
> repair behaves in my configuration (or is it doing anything at all).
>
> 2018-04-21 14:04 GMT+02:00 Rahul Singh <rahul.xavier.si...@gmail.com>:
>
>> Read repairs are one anti-entropy measure. Continuous repairs is another.
>> If you do repairs via Reaper or your own method it will resolve your
>> discrepencies.
>>
>> On Apr 21, 2018, 3:16 AM -0400, Grzegorz Pietrusza <gpietru...@gmail.com>,
>> wrote:
>>
>> Hi all
>>
>> I'm a bit confused with how read repair works in my case, which is:
>> - multiple DCs with RF 1 (NetworkTopologyStrategy)
>> - reads with consistency ONE
>>
>>
>> The article #1 says that read repair in fact runs RF reads for some
>> percent of the requests. Let's say I have read_repair_chance = 0.1. Does
>> it mean that 10% of requests will be read in all DCs (digest) and processed
>> in a background?
>>
>> On the other hand article #2 says that for consistency ONE read repair is
>> not performed. Does it mean that in my case read repair does not work at
>> all? Is there any way to enable read repair across DCs and stay will
>> consistency ONE for reads?
>>
>>
>> #1 https://www.datastax.com/dev/blog/common-mistakes-and-misconceptions
>> #2
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
>>
>> Regards
>> Grzegorz
>>
>>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Does Cassandra supports ACID txn

2018-04-19 Thread Ben Slater
leC
>>>
>>> Would the system rollback the operations done for TableA TableB ?
>>>
>>> -Rajesh
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 19, 2018 at 1:25 PM, Jacques-Henri Berthemet <
>>> jacques-henri.berthe...@genesys.com> wrote:
>>>
>>> Cassandra support LWT (Lightweight transactions), you may find this doc
>>> interesting:
>>>
>>>
>>> https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlDataConsistencyTOC.html
>>>
>>>
>>>
>>> In any case, LWT or BATCH you won’t have external control on the tx, it’s
>>> either done or not done. In case of timeout you won’t have a way to
>>> know if it worked or not.
>>>
>>> There is no way to rollback a statement/batch, the only way is to send
>>> an update to modify the partition to its previous state.
>>>
>>>
>>>
>>> Regards,
>>>
>>> *--*
>>>
>>> *Jacques-Henri Berthemet*
>>>
>>>
>>>
>>> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
>>> *Sent:* Thursday, April 19, 2018 9:10 AM
>>> *To:* user <user@cassandra.apache.org>
>>> *Subject:* Re: Does Cassandra supports ACID txn
>>>
>>>
>>>
>>> No ACID transaction any soon in Cassandra
>>>
>>>
>>>
>>> On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore <rajesh10si...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I am bit confused by reading different articles, does recent version of
>>> Cassandra supports ACID transaction ?
>>>
>>> I found BATCH command , but not sure if it supports rollback, consider
>>> that transaction I am going to perform would be on single partition.
>>>
>>> Also, what are the limitations if any?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Rajesh
>>>
>>>
>>>
>>>
>>>
>>
>>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
“* 1000 statements in in each batch” sounds like you are doing batching in
both cases. I wouldn't expect things to get better with larger sizes than
that. We’ve generally found more like 100 is the sweet spot but I’m sure it’s
data specific.

On Sun, 18 Mar 2018 at 21:17 onmstester onmstester <onmstes...@zoho.com>
wrote:

> I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch
> = 100K insert queue in non-batch scenario.
> Using more than 1000 statememnts per batch throws batch limit exception
> and some documents recommend no to change batch_size_limit??!
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sun, 18 Mar 2018 13:14:54 +0330 *Ben Slater
> <ben.sla...@instaclustr.com <ben.sla...@instaclustr.com>>* wrote 
>
> When you say batch was worth than async in terms of throughput are you
> comparing throughput with the same number of threads or something? I would
> have thought if you have much less CPU usage on the client with batching
> and your Cassandra cluster doesn’t sound terribly stressed then there is
> room to increase threads on the client to up throughput (unless your
> bottlenecked on IO or something)?
>
> On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <onmstes...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> <ben.sla...@instaclustr.com <ben.sla...@instaclustr.com>>* wrote 
>
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstes...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater**Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
When you say batch was worth than async in terms of throughput are you
comparing throughput with the same number of threads or something? I would
have thought if you have much less CPU usage on the client with batching
and your Cassandra cluster doesn’t sound terribly stressed then there is
room to increase threads on the client to up throughput (unless your
bottlenecked on IO or something)?

On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <onmstes...@zoho.com>
wrote:

> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> <ben.sla...@instaclustr.com <ben.sla...@instaclustr.com>>* wrote 
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstes...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
You will probably find grouping writes into small batches improves overall
performance (if you are not doing it already). See the following
presentation for some more info:
https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes

Cheers
Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstes...@zoho.com>
wrote:

> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: cassandra spark-connector-sqlcontext too many tasks

2018-03-17 Thread Ben Slater
I think that is probably a question for the Spark Connector forum:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
as
it’s much more related to the function of the connector than functionality
of Cassandra itself.

Cheers
Ben

On Sat, 17 Mar 2018 at 21:18 onmstester onmstester <onmstes...@zoho.com>
wrote:

>
> I'm querying a single cassandra partition using sqlContext and Its temView
> which creates more than 2000 tasks on spark and took about 360 seconds:
>
>
> sqlContext.read().format("org.apache.spark.sql.cassandra).options(ops).load.createOrReplaceTempView("tableName")
>
> But using javaFunctions(sc).cassandraTable().where() it creates only one
> task which response in 200 ms!
> I'm using exactly the same where clause for both scenarios.
> Spark UI shows like 60 GB input for sqlcontext scenario and only a few KBs
> for javaFunctions scenario
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Ben Slater
It is important to make sure you are using the same NTP servers across your
cluster - we used to see relatively frequent NTP issues across our fleet
using default/public NTP servers until (back in 2015) we implemented our
own NTP pool (see
https://www.instaclustr.com/apache-cassandra-synchronization/ which
references some really good and detailed posts from logentries.com on the
potential issues).

Cheers
Ben

On Fri, 9 Mar 2018 at 02:07 Michael Shuler <mich...@pbandjelly.org> wrote:

> As long as your nodes are syncing time using the same method, that
> should be good. Don't mix daemons, however, since they may sync from
> different sources. Whether you use ntpd, openntp, ntpsec, chrony isn't
> really important, since they are all just background daemons to sync the
> system clock. There is nothing Cassandra-specific.
>
> --
> Kind regards,
> Michael
>
> On 03/08/2018 04:15 AM, Kyrylo Lebediev wrote:
> > Hi!
> >
> > Recently Amazon announced launch of Amazon Time Sync Service
> > (
> https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
> )
> > and now it's AWS-recommended way for time sync on EC2 instances
> > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html).
> > It's stated there that chrony is faster / more precise than ntpd.
> >
> > Nothing to say correct time sync configuration is very important for any
> > C* setup.
> >
> > Does anybody have positive experience using crony, Amazon Time Sync
> > Service with Cassandra and/or combination of them?
> > Any concerns regarding chrony + Amazon Time Sync Service + Cassandra?
> > Are there any chrony best-practices/custom settings for C* setups?
> >
> > Thanks,
> > Kyrill
> >
>
>
> ---------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: backup/restore cassandra data

2018-03-07 Thread Ben Slater
You should be able to follow the same approach(s) as restoring from a
backup as outlined here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html#ops_backup_snapshot_restore_t

Cheers
Ben

On Thu, 8 Mar 2018 at 17:07 onmstester onmstester <onmstes...@zoho.com>
wrote:

> Would it be possible to copy/paste Cassandra data directory from one of
> nodes (which Its OS partition corrupted) and use it in a fresh Cassandra
> node? I've used rf=1 so that's my only chance!
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread Ben Slater
I would say you are better off spreading out the deletes so compactions
have the best chance of actually removing them from disk before they become
a problem. You will likely need to pay close attempting to compaction
strategy tuning.

I don’t have any personal experience with it but you may also want to check
out deleting compaction strategy to see if it works for your use case:
https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy

Cheers
Ben

On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) <chars...@cisco.com>
wrote:

> Well it’s not like that. We don’t just purge. There are business rules
> which will decide the records to be purged or archived and then purged, so
> cannot rely on TTL.
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *Jens Rantil <jens.ran...@tink.se>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Tuesday, March 6, 2018 at 12:34 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: One time major deletion/purge vs periodic deletion
>
>
>
> Sounds like you are using Cassandra as a queue. It's an antibiotic
> pattern. What I would do would be to rely on TTL for removal of data and
> use the TWCS compaction strategy to handle removal and you just focus on
> insertion.
>
> On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) <chars...@cisco.com>
> wrote:
>
> Hi,
>
>
>
>   Wanted the community’s feedback on deciding the schedule of Archive
> and Purge job.
>
> Is it better to Purge a large volume of data at regular intervals (like
> run A jobs once in 3 months ) or purge smaller amounts more frequently
> (run the job weekly??)
>
>
>
> Some estimates on the number of deletes performed would be…upto 80-90K
>  rows purged in 3 months vs 10K deletes every week ??
>
>
>
> Thanks,
>
> Charu
>
>
>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> <https://maps.google.com/?q=Wallingatan+5,+111+60+Stockholm,+Sweden=gmail=g>
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra/Spark failing to process large table

2018-03-06 Thread Ben Slater
Hi Faraz

Yes, it likely does mean there is inconsistency in the replicas. However,
you shouldn’t be too freaked out about it - Cassandra is design to allow
for this inconsistency to occur and the consistency levels allow you to
achieve consistent results despite replicas not being consistent. To keep
you replicas as consistent as possible (which is still a good thing), you
do need to regularly run repairs (once a week is the standard
recommendation for full repairs). Inconsistency can result from a whole
range of conditions from nodes being down the cluster being overloaded to
network issues.

Cheers
Ben

On Tue, 6 Mar 2018 at 22:18 Faraz Mateen <fmat...@an10.io> wrote:

> Thanks a lot for the response.
>
> Setting consistency to ALL/TWO started giving me consistent  count results
> on both cqlsh and spark. As expected, my query time has increased by 1.5x (
> Before, it was taking ~1.6 hours but with consistency level ALL, same query
> is taking ~2.4 hours to complete.)
>
> Does this mean my replicas are out of sync? When I first started pushing
> data to cassandra, I had a single node setup. Then I added two more nodes,
> changed replication factor to 2 and ran nodetool repair to distribute data
> to all the nodes. So, according to my understanding the nodes should have
> passively replicated data among themselves to remain in sync.
>
> Do I need to run repairs repeatedly to keep data in sync?
> How can I further debug why my replicas were not in sync before?
>
> Thanks,
> Faraz
>
> On Sun, Mar 4, 2018 at 9:46 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>> Both CQLSH and the Spark Cassandra query at consistent level ONE
>> (LOCAL_ONE for Spark connector) by default so if there is any inconsistency
>> in your replicas this can resulting in inconsistent query results.
>>
>> See http://cassandra.apache.org/doc/latest/tools/cqlsh.html and
>> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md
>>  for
>> info on how to chance consistency. If you are unsure of how consistent the
>> on-disk replicas are (eg if you have been writing at CL One or haven’t run
>> repaires) that using consistency level all should give you the most
>> consistent results but requires all replicas to be available for the query
>> to succeed. If you are using QUORUM for your writes then querying at QUORUM
>> or LOCAL_QUORUM as appropriate should give you consistent results.
>>
>> Cheers
>> Ben
>>
>> On Sun, 4 Mar 2018 at 00:59 Kant Kodali <k...@peernova.com> wrote:
>>
>>> The fact that cqlsh itself gives different results tells me that this
>>> has nothing to do with spark. Moreover, spark results are monotonically
>>> increasing which seem to be more consistent than cqlsh. so I believe
>>> spark can be taken out of the equation.
>>>
>>>  Now, while you are running these queries is there another process or
>>> thread that is writing also at the same time ? If yes then your results are
>>> fine but If it's not, you may want to try nodetool flush first and then run
>>> these iterations again?
>>>
>>> Thanks!
>>>
>>>
>>> On Fri, Mar 2, 2018 at 11:17 PM, Faraz Mateen <fmat...@an10.io> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I am trying to use spark to process a large cassandra table (~402
>>>> million entries and 84 columns) but I am getting inconsistent results.
>>>> Initially the requirement was to copy some columns from this table to
>>>> another table. After copying the data, I noticed that some entries in the
>>>> new table were missing. To verify that I took count of the large source
>>>> table but I am getting different values each time. I tried the queries on a
>>>> smaller table (~7 million records) and the results were fine.
>>>>
>>>> Initially, I attempted to take count using pyspark. Here is my pyspark
>>>> script:
>>>>
>>>> spark = SparkSession.builder.appName("Datacopy App").getOrCreate()
>>>> df = 
>>>> spark.read.format("org.apache.spark.sql.cassandra").options(table=sourcetable,
>>>>  keyspace=sourcekeyspace).load().cache()
>>>> df.createOrReplaceTempView("data")
>>>> query = ("select count(1) from data " )
>>>> vgDF = spark.sql(query)
>>>> vgDF.show(10)
>>>>
>>>> Spark submit command is as follows:
>>>>
>>>> ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --master 
>>>> spark://10.128.0.18:7077 --package

Re: Cassandra/Spark failing to process large table

2018-03-03 Thread Ben Slater
.8.0_131]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>  ~[na:1.8.0_131]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>  [na:1.8.0_131]
>> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
>>
>> *Versions:*
>>
>>- Cassandra 3.9
>>- Spark 2.1.0
>>- Datastax's spark-cassandra-connector 2.0.1
>>- Scala version 2.11
>>
>> *Cluster:*
>>
>>- Spark setup with 3 workers and 1 master node.
>>- 3 worker nodes also have a cassandra cluster installed.
>>- Each worker node has 8 CPU cores and 40 GB RAM.
>>
>> Any help will be greatly appreciated.
>>
>> Thanks,
>> Faraz
>>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Ben Slater
conservative, but
>>> > it’s very easy to go too far in the opposite direction.
>>> > >
>>> >
>>> > I appreciate that but when such concerns result in inaction instead of
>>> > resolution that is no good.
>>> >
>>> > >> Software exists to automate tasks for humans, not mechanize humans
>>> > >> to
>>> > administer tasks for a database.  I’m an engineering type.  My job is
>>> > to apply science and technology to solve real world problems.  And
>>> > that’s where I need an organization’s I.T. talent to focus; not in
>>> > crank starting an unfinished database.
>>> > >>
>>> > >
>>> > >And that’s why nobody’s done it - we all have bigger problems we’re
>>> > >being
>>> > paid to solve, and nobody’s felt it necessary. Because it’s not
>>> > necessary, it’s nice, but not required.
>>> > >
>>> >
>>> > Of course you would say that, you're Jeff Jirsa.  In apprenticeship
>>> > speak, you’re a master.  It's the classic challenge of trying to  get
>>> > a master to see the legitimate issues of the apprentices.  I do
>>> > appreciate the time you give to answer posts to the groups , like this
>>> > post.  So I don't want you to take anything the wrong way.  Where it's
>>> > going to bit everyone is in the future adoption rate.  It has to be
>>> addressed.
>>> >
>>> > [snip]
>>> >
>>> > >> Certificate management should be automated.
>>> > >>
>>> > >Stefan (in particular) has done a fair amount of work on this, but
>>> > >I’d
>>> > bet 90% of users don’t use ssl and genuinely don’t care.
>>> > >
>>> >
>>> > I didn't realize.  Could I trouble you for a link so I could get up to
>>> > speed?
>>> >
>>> > >> Cluster wide management should be a big theme in any next major
>>> release.
>>> > >>
>>> > >Na. Stability and testing should be a big theme in the next major
>>> release.
>>> > >
>>> >
>>> > Double Na on that one Jeff.  I think you have a concern there about
>>> > the need to test sufficiently to ensure the stability of the next
>>> > major release.  That makes perfect sense.- for every release,
>>> > especially the major ones.  Continuous improvement is not a phase of
>>> > development for example.  CI should be in everything, in every phase.
>>> > Stability and testing a part of every release not just one.  A major
>>> > release should be a nice step from the previous major release though.
>>> >
>>> > >> What is a major release?  How many major releases could a program
>>> > >> have
>>> > before all the coding for basic stuff like installation, configuration
>>> > and maintenance is included!
>>> > >>
>>> > >> Finish the basic coding of Cassandra, make it easy to use for
>>> > administrators, make is smart, add cluster wide management.  Keep
>>> > Cassandra competitive or it will soon be the old Model T we all
>>> remember fondly.
>>> > >>
>>> > >
>>> > >Let’s keep some perspective. Most of us came to Cassandra from rdbms
>>> > worlds where we were building solutions out of a bunch of master/slave
>>> > MySQL / Postgres type databases. I started using Cassandra 0.6 when I
>>> > needed to store something like 400gb/day in 200whatever on spinning
>>> > disks when 100gb felt like a “big” database, and the thought of
>>> > writing runbooks and automation to automatically pick the most up to
>>> > date slave as the new master, promote it, repoint the other slave to
>>> > the new master, then reformat the old master and add it as a new slave
>>> > without downtime and without potentially deleting the company’s whole
>>> dataset sounded awful.
>>> > Cassandra solved that problem, at the cost of maintaining a few yaml
>>> > (then
>>> > xml) files. Yes there are rough edges - they get slightly less rough
>>> > on each new release. Can we do better? Sure, use your engineering time
>>> > and send some patches. But the basic stuff is the nuts and bolts of
>>> > the
>>> > database: I care way more about streaming and compaction than I’ll
>>> > ever care about installation.
>>> > >
>>> >
>>> > I can relate.  I was studying the enterprise level MS SQL Server
>>> > stuff. I noticed exactly what you described.  I decided maybe I'll
>>> > just do other stuff and wait for things to develop more.  I'm very
>>> > excited about the way Cassandra addresses things.  Streaming and
>>> > compaction - very good.  I'm glad.  Items related to usability are not
>>> optional though.
>>> >
>>> > >> I ask the Committee to compile a list of all such items, make a
>>> > >> plan,
>>> > and commit to including the completed and tested code as part of major
>>> > release 5.0.  I further ask that release 4.0 not be delayed and then
>>> > there be an unusually short skip to version 5.0.
>>> > >>
>>> > >
>>> > >The committers are working their ass off on all sorts of hard
>>> problems.
>>> > Some of those are probably even related to Cassandra. If you have
>>> > idea, open a JIRA. If you have time, send a patch. Or review a patch.
>>> > But don’t expect a bunch of people to set down work on optimizing the
>>> > database to work on packaging and installation, because there’s no ROI
>>> > in it for 99% of the existing committers: we’re working on the
>>> > database to solve problems, and installation isn’t one of those
>>> problems.
>>> >
>>> > I'm sure they are working very hard on all kinds of hard problems.  I
>>> > actually wrote "Committee", not "committers"  There is an obvious
>>> > shortage of contributors when you consider the size of the
>>> > organizations using Cassandra.  That leave the burden on an unfair
>>> > few.  Installation or more generally I would say usability is not that
>>> > big a problem for the big companies out there. Good for them.
>>> >
>>> > Ask a new organization or a modest size organization that is
>>> > struggling to manage their Cassandra cluster that usability is not a
>>> > big problem. It truly is a big problem for many stakeholders of
>>> > Cassandra. It needs to be given a bigger priority.  Hopefully others
>>> will weigh in.
>>> >
>>> > Kenneth Brotman
>>> >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > <user-unsubscribe@cassandra.apacheorg>
>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-12 Thread Ben Slater
We’re seeing evidence across our fleet that AWS has rolled something out in
the last 24 hours that has significantly reduce the performance impacts -
back pretty close to pre-patch levels. Yet to see if the impacts come back
with o/s patching on top of the improved hypervisor.

Cheers
Ben



On Thu, 11 Jan 2018 at 05:32 Jon Haddad <j...@jonhaddad.com> wrote:

> For what it’s worth, we (TLP) just posted some results comparing pre and
> post meltdown statistics:
> http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html
>
>
> On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> m4.xlarge do have PCID to my knowledge, but possibly we need a rather new
> kernel 4.14. But I fail to see how this could help anyway, cause this looks
> highly Amazon Hypervisor patch related and we do not have the production
> instances patched at OS/VM level (yet).
>
> Thomas
>
> *From:* Dor Laor [mailto:d...@scylladb.com <d...@scylladb.com>]
> *Sent:* Dienstag, 09. Jänner 2018 19:30
> *To:* user@cassandra.apache.org
> *Subject:* Re: Meltdown/Spectre Linux patch - Performance impact on
> Cassandra?
>
> Make sure you pick instances with PCID cpu capability, their TLB overhead
> flush
> overhead is much smaller
>
> On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Quick follow up.
>
> Others in AWS reporting/seeing something similar, e.g.:
> https://twitter.com/BenBromhead/status/950245250504601600
>
> So, while we have seen an relative CPU increase of ~ 50% since Jan 4,
> 2018, we now also have applied a kernel update at OS/VM level on a single
> node (loadtest and not production though), thus more or less double patched
> now. Additional CPU impact by OS/VM level kernel patching is more or less
> negligible, so looks highly Hypervisor related.
>
> Regards,
> Thomas
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Freitag, 05. Jänner 2018 12:09
> *To:* user@cassandra.apache.org
> *Subject:* Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>
> Hello,
>
> has anybody already some experience/results if a patched Linux kernel
> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>
> In production, all nodes running in AWS with m4.xlarge, we see up to a 50%
> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018,
> most likely correlating with Amazon finished patching the underlying
> Hypervisor infrastructure …
>
> Anybody else seeing a similar CPU increase?
>
> Thanks,
> Thomas
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>
>
>

-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This

Re: How quickly we can bootstrap

2017-11-17 Thread Ben Slater
Hi Anshu

For quick scaling, we’ve had success with an approach of scaling up the
compute capacity (attached to EBS) rather than scaling out with more nodes
in order to provide relatively quick scale up/down capability. The approach
is implemented as part of our managed service but the concept is generic
enough to work in any virtualised environment. You can find more detail
here if interested:
https://www.instaclustr.com/instaclustr-dynamic-resizing-for-apache-cassandra/

Cheers
Ben

On Sat, 18 Nov 2017 at 05:02 Anshu Vajpayee <anshu.vajpa...@gmail.com>
wrote:

> Cassandra supports elastic scalability  - meaning on demand we can
> increase or decrease #of nodes as per scaling demand from the application.
>
> Let's consider we have 5 node cluster and each node has data pressure of
> about 3 TB.
>
> Now as per sudden load, we want to add 1 node in the cluster  as quick as
> possible.
>
> Please suggest what would be the fastest method to add the new node on
> cluster? Normal bootstrapping will definitely take time because it needs to
> stream at least 2.5 TB ( 5*3TB/6 nodes) from 5 nodes.  Please
> consider multi-core machines & 10 Gpbs card .
>
> Streaming throughput can help but not much.
>
> The similar requirement can come when we want to replace the failed node
> due to any hardware.
>
> Please suggest any best practice or scenarios to deal with above
> situations.
>
> Scaling is good but how quickly we can scale is another thing to consider.
>
>
>
>
>
>
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Creating a copy of a C* cluster

2017-08-07 Thread Ben Slater
For minimum disruption to your production cluster, restoring from backups
is probably the best option. However, there is no reason adding a DC,
building and then splitting shouldn’t work if done correct.

Cheers
Ben

On Tue, 8 Aug 2017 at 07:11 Robert Wille <rwi...@fold3.com> wrote:

> We need to make a copy of a cluster. We’re going to do some testing
> against the copy and then discard it. What’s the best way of doing that? I
> created another datacenter, and then have tried to divorce it from the
> original datacenter, but have had troubles doing so.
>
> Suggestions?
>
> Thanks in advance
>
> Robert
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra UNREACHABLE node

2017-07-18 Thread Ben Slater
and.java:696)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:626)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:114)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:190)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
> ~[apache-cassandra-3.9.0.jar:3.9.0]
> No error in /var/log/cassandra/debug.php on 10.0.0.12
>
> Remember 10.0.0.11 & 10.0.0.12 are seed nodes
>
> Please give me some direction to fix this problem.
> Thanks
>
> Regards,
> Shashikant
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Manual Repairs

2017-06-21 Thread Ben Slater
The closest you can get to this kind of functionality is by breaking up
your repairs by ranges and then you could pause/restart part way through
the set of ranges. There are some basic scripted approaches around to doing
this but Cassandra Reaper is probably your best bet to get this kind of
functionality in most circumstances.

Cheers
Ben

On Thu, 22 Jun 2017 at 08:27 Mark Furlong <mfurl...@ancestry.com> wrote:

> Can a repair be paused, and if paused can it be restarted from the point
> of the pause, or does it start over?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com <mfurl...@ancestry.com>*
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
>
> Lehi, UT 84043
>
>
>
>
>
> ​
> [image: image003.png]
>
>
>
>
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Long running compaction on huge hint table.

2017-05-21 Thread Ben Slater
ery. This will cause GC pauses, which
>>>> will cause hints to fail to be delivered, which will cause more hints to be
>>>> stored. This is bad.
>>>> >
>>>> > In 3.0, hints were rewritten to work around this design flaw. In 2.1,
>>>> your most likely corrective course is to use 'nodetool truncatehints' on
>>>> all servers, followed by 'nodetool repair' to deliver the data you lost by
>>>> truncating the hints.
>>>> >
>>>> > NOTE: this is ONLY safe if you wrote with a consistency level
>>>> stronger than CL:ANY. If you wrote this data with CL:ANY, you may lose data
>>>> if you truncate hints.
>>>> >
>>>> > - Jeff
>>>> >
>>>> >> On 2017-05-16 06:50 (-0700), varun saluja <saluj...@gmail.com>
>>>> wrote:
>>>> >> Thanks for update.
>>>> >> I could see lot of io waits. This causing  Gc and mutation drops .
>>>> >> But as i mentioned we do not have high load for now. Hint replays
>>>> are creating such high disk I/O.
>>>> >> compactionstats show very high hint bytes like 780gb around. Is this
>>>> normal?
>>>> >>
>>>> >> Just mentioning we are using flash disks.
>>>> >>
>>>> >> In such case, if i run truncatehints , will it remove or decrease
>>>> size of hints bytes in compaction stats. I can trigger repair therafter.
>>>> >> Please let me know if any recommendation on same.
>>>> >>
>>>> >> Also , table which we dumped from kafka which created this much
>>>> hints and compaction pendings is also dropped today. Because we have to
>>>> redump table again once cluster is stable.
>>>> >>
>>>> >> Regards,
>>>> >> Varun
>>>> >>
>>>> >> Sent from my iPhone
>>>> >>
>>>> >>> On 16-May-2017, at 6:59 PM, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>> >>>
>>>> >>> Yes but it means data has to be replicated using repair.
>>>> >>>
>>>> >>> Hints are out come of unhealthy nodes, focus on finding why you
>>>> have mutation drops, is it node, io or network etc. ideally you shouldn't
>>>> see increasing hints all the time.
>>>> >>>
>>>> >>> Sent from my iPhone
>>>> >>>
>>>> >>>> On May 16, 2017, at 7:58 AM, varun saluja <saluj...@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hi Nitan,
>>>> >>>>
>>>> >>>> Thanks for response.
>>>> >>>>
>>>> >>>> Yes, I could see mutation drops and increase count in
>>>> system.hints. Is there any way , i can proceed to truncate hints like using
>>>> nodetool truncatehints.
>>>> >>>>
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Varun Saluja
>>>> >>>>
>>>> >>>>> On 16 May 2017 at 17:52, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>> >>>>> Do you see mutation drops?
>>>> >>>>> Select count from system.hints; is it increasing?
>>>> >>>>>
>>>> >>>>> Sent from my iPhone
>>>> >>>>>
>>>> >>>>>> On May 16, 2017, at 5:52 AM, varun saluja <saluj...@gmail.com>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi Experts,
>>>> >>>>>>
>>>> >>>>>> We are facing issue on production cluster. Compaction on
>>>> system.hint table is running from last 2 days.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> pending tasks: 1
>>>> >>>>>>   compaction type   keyspace   table completed
>>>> total  unit   progress
>>>> >>>>>>  Compaction system   hints   20623021829
>>>>  877874092407   bytes  2.35%
>>>> >>>>>> Active compaction remaining time :   0h27m15s
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Active compaction remaining time shows in minutes.  But, this is
>>>> job is running like indefinitely.
>>>> >>>>>>
>>>> >>>>>> We have 3 node cluster V 2.1.7. And we ran  write intensive job
>>>> last week on particular table.
>>>> >>>>>> Compaction on this table finished but hint table size is growing
>>>> continuously.
>>>> >>>>>>
>>>> >>>>>> Can someone Please help me.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Thanks & Regards,
>>>> >>>>>> Varun Saluja
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> >
>>>> > -
>>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>>> >
>>>>
>>>
>>>
>>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Consistency Level vs. Retry Policy when no local nodes are available

2017-03-20 Thread Ben Slater
I think the general assumption is that DC failover happens at the client
app level rather than the Cassandra level due to the potentially very
significant difference in request latency if you move from a app-local DC
to a remote DC. The preferred pattern for most people is that the app fails
in a failed  DC and some load balancer above the app redirects traffic to a
different DC.

The other factor is that the fail-back scenario from a failed DC and
LOCAL_* consistencies is potentially complex. Do you want to immediately
start using the new DC when it becomes available (with missing data) or
wait until it catches up on writes (and how do you know when that has
happened)?

Note also QUORUM is a clear majority of replicas across both DCs. Some
people run 3 DCs with RF 3 in each and QUORUM to maintain strong
consistency across DCs even with DC failure.

Cheers
Ben

On Tue, 21 Mar 2017 at 10:00 Shannon Carey <sca...@expedia.com> wrote:

Specifically, this puts us in an awkward position because LOCAL_QUORUM is
desirable so that we don't have unnecessary cross-DC traffic from the
client by default, but we can't use it because it will cause complete
failure if the local DC goes down. And we can't use QUORUM because it would
fail if there's not a quorum in either DC (as would happen if one DC goes
down). So it seems like we are forced to use a lesser consistency such as
ONE or TWO.

-Shannon

From: Shannon Carey <sca...@expedia.com>
Date: Monday, March 20, 2017 at 5:25 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Consistency Level vs. Retry Policy when no local nodes are
available

I am running DSE 5.0, and I have a Java client using the Datastax 3.0.0
client library.

The client is configured to use a DCAwareRoundRobinPolicy wrapped in a
TokenAwarePolicy. Nothing special.

When I run my query, I set a custom retry policy.

I am testing cross-DC failover. I have disabled connectivity to the "local"
DC (relative to my client) in order to perform the test. When I run a query
with the first consistency level set to LOCAL_ONE (or local anything), my
retry policy is never called and I always get this exception:
"com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (no host was tried)"

getErrors() on the exception is empty.

This is contrary to my expectation that the first attempt would fail and
would allow my RetryPolicy to attempt a different (non-LOCAL) consistency
level. I have no choice but to avoid using any kind of LOCAL consistency
level throughout my applications. Is this expected? Or is there anything I
can do about it? Thanks! It certainly seems like a bug to me or at least
something that should be improved.

-Shannon

-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Slow repair

2017-03-15 Thread Ben Slater
When you say you’re running repair to “rebalance” do you mean to populate
the new DC? If so, the normal/correct procedure is to use nodetool rebuild
rather than repair. See
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
for
the full details.

Cheers
Ben

On Wed, 15 Mar 2017 at 21:14 Gábor Auth <auth.ga...@gmail.com> wrote:

> Hi,
>
> We are working with a two DCs Cassandra cluster (EU and US), so that the
> distance is over 160 ms between them. I've added a new DC to this cluster,
> modified the keyspace's replication factor and trying to rebalance it with
> repair but the repair is very slow (over 10-15 minutes per node per
> keyspace with ~40 column families). Is it normal with this network latency
> or something wrong with the cluster or the network connection? :)
>
> [2017-03-15 05:52:38,255] Starting repair command #4, repairing keyspace
> test20151222 with repair options (parallelism: parallel, primary range:
> true, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters:
> [], hosts: [], # of ranges: 32)
> [2017-03-15 05:54:11,913] Repair session
> 988bd850-0943-11e7-9c1f-f5ba092c6aea for range
> [(-3328182031191101706,-3263206086630594139],
> (-449681117114180865,-426983008087217811],
> (-4940101276128910421,-4726878962587262390],
> (-4999008077542282524,-4940101276128910421]] finished (progress: 11%)
> [2017-03-15 05:55:39,721] Repair session
> 9a6fda92-0943-11e7-9c1f-f5ba092c6aea for range
> [(7538662821591320245,7564364667721298414],
> (8095771383100385537,8112071444788258953],
> (-1625703837190283897,-1600176580612824092],
> (-1075557915997532230,-1072724867906442440], (-9152
> 563942239372475,-9123254980705325471],
> (7485905313674392326,7513617239634230698]] finished (progress: 14%)
> [2017-03-15 05:57:05,718] Repair session
> 9de181b1-0943-11e7-9c1f-f5ba092c6aea for range
> [(-6471953894734787784,-6420063839816736750],
> (1372322727565611879,1480899944406172322],
> (1176263633569625668,1177285361971054591],
> (440549646067640682,491840653569315468], (-43128299
> 75221321282,-4177428401237878410]] finished (progress: 17%)
> [2017-03-15 05:58:39,997] Repair session
> a18bc500-0943-11e7-9c1f-f5ba092c6aea for range
> [(5327651902976749177,5359189884199963589],
> (-5362946313988105342,-5348008210198062914],
> (-5756557262823877856,-5652851311492822149],
> (-5400778420101537991,-5362946313988105342], (668
> 2536072120412021,6904193483670147322]] finished (progress: 20%)
> [2017-03-15 05:59:11,791] Repair session
> a44f2ac2-0943-11e7-9c1f-f5ba092c6aea for range
> [(952873612468870228,1042958763135655298],
> (558544893991295379,572114658167804730]] finished (progress: 22%)
> [2017-03-15 05:59:56,197] Repair session
> a5e13c71-0943-11e7-9c1f-f5ba092c6aea for range
> [(1914238614647876002,1961526714897144472],
> (3610056520286573718,3619622957324752442],
> (-3506227577233676363,-3504718440405535976],
> (-4120686433235827731,-4098515820338981500], (56515
> 94158011135924,5668698324546997949]] finished (progress: 25%)
> [2017-03-15 06:00:45,610] Repair session
> a897a9e1-0943-11e7-9c1f-f5ba092c6aea for range
> [(-9007733666337543056,-8979974976044921941]] finished (progress: 28%)
> [2017-03-15 06:01:58,826] Repair session
> a927b4e1-0943-11e7-9c1f-f5ba092c6aea for range
> [(3599745202434925817,3608662806723095677],
> (3390003128426746316,3391135639180043521],
> (3391135639180043521,3529019003015169892]] finished (progress: 31%)
> [2017-03-15 06:03:15,440] Repair session
> aae06160-0943-11e7-9c1f-f5ba092c6aea for range
> [(-7542303048667795773,-7300899534947316960]] finished (progress: 34%)
> [2017-03-15 06:03:17,786] Repair completed successfully
> [2017-03-15 06:03:17,787] Repair command #4 finished in 10 minutes 39
> seconds
>
> Bye,
> Gábor Auth
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Read after Write inconsistent at times

2017-02-23 Thread Ben Slater
Have you checked that NTP is correctly synched across all nodes in your
cluster?

On Fri, 24 Feb 2017 at 17:29 Charulata Sharma (charshar) <chars...@cisco.com>
wrote:

>
> Hi All,
>
> In my application sometimes I cannot read data that just got inserted.
> This happens very intermittently. Both write and read use LOCAL QUOROM.
>
> We have a cluster of 12 nodes which spans across 2 Data Centers and a RF
> of 3.
>
> Has anyone encountered this problem and if yes what steps have you taken
> to solve it
>
> Thanks,
> Charu
>
-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Ben Slater
We’ve actually had several customers where we’ve done the opposite - split
large clusters apart to separate uses cases. We found that this allowed us
to better align hardware with use case requirements (for example using AWS
c3.2xlarge for very hot data at low latency, m4.xlarge for more general
purpose data) we can also tune JVM settings, etc to meet those uses cases.

Cheers
Ben

On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <oleksandr.shul...@zalando.de>
wrote:

> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma <ve...@uber.com> wrote:
>
> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>
>
> Hi,
>
> We've never tried this approach and given my limited experience I would
> find this a terrible idea from the perspective of maintenance (remember the
> old saying about basket and eggs?)
>
> What potential benefits do you see?
>
> Regards,
> --
> Alex
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
That’s a very good point from Sylvain that I forgot/missed. That said,
we’ve seen plenty of scenarios where overall system throughput is improved
through unlogged batches. One of my colleagues did quite a bit of
benchmarking on this topic for his talk at last year’s C* summit:
http://www.slideshare.net/DataStax/microbatching-highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016

On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.r...@jaumo.com> wrote:

> Ok got it.
>
> But it's interesting that this is supported:
> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>
> This is technically mostly the same (Token awareness,
> coordination/routing, read performance, ...), right?
>
> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:
>
> This is a statement on multiple partitions and there is really no
> optimization the code internally does on that. In fact, I strongly advise
> you to not use a batch but rather simply do a for loop client side and send
> statement individually. That way, your driver will be able to use proper
> token-awareness for each request (while if you send a batch, one
> coordinator will be picked up and will have to forward most statement,
> doing more network hops at the end of the day). The only case where using a
> batch is indeed legit is if you care about all the statement being atomic,
> but in that case it's a logged batch you want.
>
> That's btw more or less why we never bothered implementing that: it's
> totally doable technically, but it's not really such a good idea
> performance wise in practice most of the time, and you can easily work it
> around with a batch if you need atomicity.
>
> Which is not saying it will never be and shouldn't be supported btw, there
> is something to be said for the consistency of the CQL language in general.
> But it's why no-one took time to do it so far.
>
> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> Yes, thats the workaround - I'll try that.
>
> Would you agree it would be better for internal optimizations to process
> this within a single statement?
>
> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>
> Yep, that makes it clear. I think an unlogged batch of prepared statements
> with one statement per PK tuple would be roughly equivalent? And probably
> no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
> 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
Yep, that makes it clear. I think an unlogged batch of prepared statements
with one statement per PK tuple would be roughly equivalent? And probably
no more complex to generate in the client?

On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com> wrote:

> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
> 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?

Cheers
Ben

On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com> wrote:

> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread Ben Slater
The major issue we’ve seen with very high density (we generally say <2TB
node is best) is manageability - if you need to replace a node or add node
then restreaming data takes a *long* time and there we fairly high chance
of a glitch in the universe meaning you have to start again before it’s
done.

Also, if you’re uses STCS you can end up with gigantic compactions which
also take a long time and can cause issues.

Heap limitations are mainly related to partition size rather than node
density in my experience.

Cheers
Ben

On Thu, 9 Feb 2017 at 08:20 Hannu Kröger <hkro...@gmail.com> wrote:

> Hello,
>
> Back in the day it was recommended that max disk density per node for
> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>
> IIRC it was mostly because of heap memory limitations? Now that off-heap
> support is there for certain data and 3.x has different data storage
> format, is that 3-5TB still a valid limit?
>
> Does anyone have experience on running Cassandra with 3-5TB compressed
> data ?
>
> Cheers,
> Hannu

-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Ben Slater
I’d like to add my thanks and congrats to everyone who has worked on this
release. It has clearly been tough to get out the door but it has been
awesome to see the commitment to quality.

Cheers
Ben

On Sat, 4 Feb 2017 at 14:09 Edward Capriolo <edlinuxg...@gmail.com> wrote:

>
> On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler <mich...@pbandjelly.org>
> wrote:
>
> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.10.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a new feature and bug fix release[1] on the 3.X series.
> As always, please pay attention to the release notes[2] and Let us
> know[3] if you were to encounter any problem.
>
> This is the last tick-tock feature release of Apache Cassandra. Version
> 3.11.0 will continue bug fixes from this point on the cassandra-3.11
> branch in git.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/J0VghF
> [2]: (NEWS.txt) https://goo.gl/00KNVW
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
> Great job all on this release.
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Query

2016-12-29 Thread Ben Slater
I wasn’t familiar with Gizzard either so I thought I’d take a look. The
first things on their github readme is:
*NB: This project is currently not recommended as a base for new consumers.*
(And no commits since 2013)

So, Cassandra definitely looks like a better choice as your datastore for a
new project.

Cheers
Ben

On Fri, 30 Dec 2016 at 12:41 Manoj Khangaonkar 
wrote:

> I am not that familiar with gizzard but with gizzard + mysql , you have
> multiple moving parts in the system that need to managed separately. You'll
> need the mysql expert for mysql and the gizzard expert to manage the
> distributed part. It can be argued that long term this will have higher
> adminstration cost
>
> Cassandra's value add is its simple peer to peer architecture that is easy
> to manage - a single database solution that is distributed, scalable,
> highly available etc. In other words, once you gain expertise cassandra,
> you get everything in one package.
>
> regards
>
>
>
>
>
> On Thu, Dec 29, 2016 at 4:05 AM, Sikander Rafiq 
> wrote:
>
> Hi,
>
> I'm exploring Cassandra for handling large data sets for mobile app, but
> i'm not clear where it stands.
>
>
> If we use MySQL as  underlying database and Gizzard for building custom
> distributed databases (with arbitrary storage technology) and Memcached for
> highly queried data, then where lies Cassandra?
>
>
> As i have read that Twitter uses both Cassandra and Gizzard. Please
> explain me where Cassandra will act.
>
>
> Thanks in advance.
>
>
> Regards,
>
> Sikander
>
>
> Sent from Outlook 
>
>
>
>
> --
> http://khangaonkar.blogspot.com/
>


Re: Cassandra cluster performance

2016-12-21 Thread Ben Slater
Given you’re using replication factor 1 (so each piece of data is only
going to get written to one node) something definitely seems wrong. Some
questions/ideas:
- are there any errors in the Cassandra logs or are you seeing any errors
at the client?
- is your test data distributed across your partition key or is it possible
all your test data is going to a single partition?
- have you tried manually running a few inserts to see if you get any
errors?

Cheers
Ben


On Thu, 22 Dec 2016 at 11:48 Branislav Janosik -T (bjanosik - AAP3 INC at
Cisco) <bjano...@cisco.com> wrote:

> Hi,
>
>
>
> - Consistency level is set to ONE
>
> -  Keyspace definition:
>
> *"CREATE KEYSPACE  IF NOT EXISTS  onem2m " *+
> *"WITH replication = " *+
> *"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}"*;
>
>
>
> - yes, the client is on separate VM
>
> - In our project we use Cassandra API version 3.0.2 but the database 
> (cluster) is version 3.9
>
> - for 2node cluster:
>
>      first VM: 25 GB RAM, 16 CPUs
>
>  second VM: 16 GB RAM, 16 CPUs
>
>
>
>
>
>
>
> *From: *Ben Slater <ben.sla...@instaclustr.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, December 21, 2016 at 2:32 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Cassandra cluster performance
>
>
>
> You would expect some drop when moving to single multiple nodes but on the
> face of it that feels extreme to me (although I’ve never personally tested
> the difference). Some questions that might help provide an answer:
>
> - what consistency level are you using for the test?
>
> - what is your keyspace definition (replication factor most importantly)?
>
> - where are you running your test client (is it a separate box to
> cassandra)?
>
> - what C* version?
>
> - what are specs (CPU, RAM) of the test servers?
>
>
>
> Cheers
>
> Ben
>
>
>
> On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at
> Cisco) <bjano...@cisco.com> wrote:
>
> Hi all,
>
>
>
> I’m working on a project and we have Java benchmark test for testing the
> performance when using Cassandra database. Create operation on a single
> node Cassandra cluster is about 15K operations per second. Problem we have
> is when I set up cluster with 2 or more nodes (each of them are on separate
> virtual machines and servers), the performance goes down to 1K ops/sec. I
> follow the official instructions on how to set up a multinode cluster – the
> only things I change in Cassandra.yaml file are: change seeds to IP address
> of one node, change listen and rpc address to IP address of the node and
> finally change endpoint snitch to GossipingPropertyFileSnitch. The
> replication factor is set to 1 when having 2-node cluster. I use only one
> datacenter. The cluster seems to be doing fine (I can see nodes
> communicating) and so is the CPU, RAM usage on the machines.
>
>
>
> Does anybody have any ideas? Any help would be very appreciated.
>
>
>
> Thanks!
>
>
>
>


Re: Cassandra cluster performance

2016-12-21 Thread Ben Slater
You would expect some drop when moving to single multiple nodes but on the
face of it that feels extreme to me (although I’ve never personally tested
the difference). Some questions that might help provide an answer:
- what consistency level are you using for the test?
- what is your keyspace definition (replication factor most importantly)?
- where are you running your test client (is it a separate box to
cassandra)?
- what C* version?
- what are specs (CPU, RAM) of the test servers?

Cheers
Ben

On Thu, 22 Dec 2016 at 09:26 Branislav Janosik -T (bjanosik - AAP3 INC at
Cisco)  wrote:

> Hi all,
>
>
>
> I’m working on a project and we have Java benchmark test for testing the
> performance when using Cassandra database. Create operation on a single
> node Cassandra cluster is about 15K operations per second. Problem we have
> is when I set up cluster with 2 or more nodes (each of them are on separate
> virtual machines and servers), the performance goes down to 1K ops/sec. I
> follow the official instructions on how to set up a multinode cluster – the
> only things I change in Cassandra.yaml file are: change seeds to IP address
> of one node, change listen and rpc address to IP address of the node and
> finally change endpoint snitch to GossipingPropertyFileSnitch. The
> replication factor is set to 1 when having 2-node cluster. I use only one
> datacenter. The cluster seems to be doing fine (I can see nodes
> communicating) and so is the CPU, RAM usage on the machines.
>
>
>
> Does anybody have any ideas? Any help would be very appreciated.
>
>
>
> Thanks!
>
>
>


Re: All nodes hosting replicas down

2016-12-18 Thread Ben Slater
And I’m not aware of any case where it’s a good idea to use SimpleStrategy
in Prod (be interested to hear if anyone else knows of one).
NetworkTopologyStrategy behaves the same as SimpleStrategy in the basic
case of one DC and Rack but gives you a good path to migrate to more
sophisticated topologies in the future. If you run on SimpleStrategy then
any future migration to NetworkTopologyStrategy will be painful.

Cheers
Ben

On Mon, 19 Dec 2016 at 07:30 Benjamin Roth  wrote:

> For sensitive data, I'd recommend to use at least RF=3. Especially if you
> plan to use CL QUORUM, for example if you want to ensure consistency.
> If you use QUORUM and RF=2 then a single failing node will make your data
> unavailable.
>
> For more information, please read
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
> There are a lot more resources out there that go more into detail when and
> why to use what CL.
>
> 2016-12-18 21:21 GMT+01:00 jean paul :
>
> Thank you so much for answer.
>
> So, in case of a high failure rate in my cluster, i have to increase the
> replication factor or i have to use NetworkToplogy Strategy ?
> That's it ?
>
> Kindly.
>
>
> 2016-12-18 20:23 GMT+01:00 Matija Gobec :
>
> If you are reading and none of the replicas is online you will get an
> exception on the read (tried x replicas but 0 responded) and your read will
> fail. Writes on the other hand are going to go through only if your write
> consistency is ANY. If your write consistency is ONE or anything upwards,
> then it will fail too.
>
> On Sun, Dec 18, 2016 at 7:47 PM, jean paul  wrote:
>
> Hi,
>
> Please, if we choose a replication factor =2 (simple strategy), so, we
> have two replicas of data on the ring.
>
> What happen in the case of all nodes containing replicas are down ?
>
>
> Thank you so much for help.
>
> Kind regards.
>
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Single cluster node restore

2016-11-29 Thread Ben Slater
You can have situations where rebuilding a node via streaming is painful
and slow (generally because there is something bad about the data model
like misused secondary indexes or massive partitions). Also, overstreaming
can mean you need more disk space to bootstrap a node than you’ll require
once it’s fully streamed and compacted - this can be hard to work around in
some environments. In this case you might want to restore a single node
from backup.

However, in general you’re right - it’s not something that tends to be done
very often.

Cheers
Ben

On Wed, 30 Nov 2016 at 04:38 Petr Malik  wrote:

>
> Hi.
>
> I have a question about Cassandra backup-restore strategies.
>
> As far as I understand Cassandra has been designed to survive hardware
> failures by relying on data replication.
>
>
> It seems like people still want backup/restore for case when somebody
> accidentally deletes data or the data gets otherwise corrupted.
>
> In that case restoring all keyspace/table snapshots on all nodes should
> bring it back.
>
>
> I am asking because I often read directions on restoring a single node in
> a cluster. I am just wondering under what circumstances could this be done
> safely.
>
>
> Please correct me if i am wrong but restoring just a single node does not
> really roll back the data as the newer (corrupt) data will be served by
> other replicas and eventually propagated to the restored node. Right?
>
> In fact by doing so one may end up reintroducing deleted data back...
>
>
> Also since Cassandra distributes the data throughout the cluster it is not
> clear on which mode any particular (corrupt) data resides and hence which
> to restore.
>
>
> I guess this is a long way of asking whether there is an advantage of
> trying to restore just a single node in a Cassandra cluster as opposed to
> say replacing the dead node and letting Cassandra handle the replication.
>
>
> Thanks.
>


Re: Does recovery continue after truncating a table?

2016-11-26 Thread Ben Slater
By “undocumented limitation”, I meant “TRUNCATE” is mainly only used in
development and testing, not production scenarios so a sufficient fix (and
certainly a better than nothing fix) might be just to document that if you
issue a TRUNCATE while there are still hinted hand-offs pending the hinted
hand-offs replayed after the truncate will come back to life. Of course, an
actual fix would be better.

Cheers
Ben

On Sat, 26 Nov 2016 at 21:08 Hiroyuki Yamada <mogwa...@gmail.com> wrote:

> Hi Yuji and Ben,
>
> I tried out this revised script and the same issue occurred to me, too.
> I think it's definitely a bug to be solved asap.
>
> >Ben
> What do you mean "an undocumented limitation" ?
>
> Thanks,
> Hiro
>
> On Sat, Nov 26, 2016 at 3:13 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
> > Nice detective work! Seems to me that it’s a best an undocumented
> limitation
> > and potentially could be viewed as a bug - maybe log another JIRA?
> >
> > One node - there is a nodetool truncatehints command that could be used
> to
> > clear out the hints
> > (
> http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate
> )
> > . However, it seems to clear all hints on particular endpoint, not just
> for
> > a specific table.
> >
> > Cheers
> > Ben
> >
> > On Fri, 25 Nov 2016 at 17:42 Yuji Ito <y...@imagine-orb.com> wrote:
> >>
> >> Hi all,
> >>
> >> I revised the script to reproduce the issue.
> >> I think the issue happens more frequently than before.
> >> Killing another node is added to the previous script.
> >>
> >>  [script] 
> >> #!/bin/sh
> >>
> >> node1_ip=
> >> node2_ip=
> >> node3_ip=
> >> node2_user=
> >> node3_user=
> >> rows=1
> >>
> >> echo "consistency quorum;" > init_data.cql
> >> for key in $(seq 0 $(expr $rows - 1))
> >> do
> >> echo "insert into testdb.testtbl (key, val) values($key, ) IF
> NOT
> >> EXISTS;" >> init_data.cql
> >> done
> >>
> >> while true
> >> do
> >> echo "truncate the table"
> >> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
> >> if [ $? -ne 0 ]; then
> >> echo "truncating failed"
> >> continue
> >> else
> >> break
> >> fi
> >> done
> >>
> >> echo "kill C* process on node3"
> >> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep
> CassandraDaemon |
> >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
> >>
> >> echo "insert $rows rows"
> >> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
> >>
> >> echo "restart C* process on node3"
> >> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra
> start"
> >>
> >> while true
> >> do
> >> echo "truncate the table again"
> >> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> >> if [ $? -ne 0 ]; then
> >> echo "truncating failed"
> >> continue
> >> else
> >> echo "truncation succeeded!"
> >> break
> >> fi
> >> done
> >>
> >> echo "kill C* process on node2"
> >> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep
> CassandraDaemon |
> >> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
> >>
> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> >> count(*) from testdb.testtbl;"
> >> sleep 10
> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> >> count(*) from testdb.testtbl;"
> >>
> >> echo "restart C* process on node2"
> >> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra
> start"
> >>
> >>
> >> Thanks,
> >> yuji
> >>
> >>
> >> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito <y...@imagine-orb.com> wrote:
> >>>
> >>> I investigated source code and logs of killed node.
> >>> I guess that unexpected writes are executed when truncation is being
> >>> executed.
> >>>
> >>> Some writes were executed after flush (the

Re: Does recovery continue after truncating a table?

2016-11-25 Thread Ben Slater
Nice detective work! Seems to me that it’s a best an undocumented
limitation and potentially could be viewed as a bug - maybe log another
JIRA?

One node - there is a nodetool truncatehints command that could be used to
clear out the hints (
http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate)
.
However, it seems to clear all hints on particular endpoint, not just for a
specific table.

Cheers
Ben

On Fri, 25 Nov 2016 at 17:42 Yuji Ito  wrote:

> Hi all,
>
> I revised the script to reproduce the issue.
> I think the issue happens more frequently than before.
> Killing another node is added to the previous script.
>
>  [script] 
> #!/bin/sh
>
> node1_ip=
> node2_ip=
> node3_ip=
> node2_user=
> node3_user=
> rows=1
>
> echo "consistency quorum;" > init_data.cql
> for key in $(seq 0 $(expr $rows - 1))
> do
> echo "insert into testdb.testtbl (key, val) values($key, ) IF NOT
> EXISTS;" >> init_data.cql
> done
>
> while true
> do
> echo "truncate the table"
> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
> if [ $? -ne 0 ]; then
> echo "truncating failed"
> continue
> else
> break
> fi
> done
>
> echo "kill C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon |
> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>
> echo "insert $rows rows"
> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
>
> echo "restart C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start"
>
> while true
> do
> echo "truncate the table again"
> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> if [ $? -ne 0 ]; then
> echo "truncating failed"
> continue
> else
> echo "truncation succeeded!"
> break
> fi
> done
>
> echo "kill C* process on node2"
> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon |
> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
> sleep 10
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
>
> echo "restart C* process on node2"
> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start"
>
>
> Thanks,
> yuji
>
>
> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito  wrote:
>
> I investigated source code and logs of killed node.
> I guess that unexpected writes are executed when truncation is being
> executed.
>
> Some writes were executed after flush (the first flush) in truncation and
> these writes could be read.
> These writes were requested as MUTATION by another node for hinted handoff.
> Their data was stored to a new memtable and flushed (the second flush) to
> a new SSTable before snapshot in truncation.
> So, the truncation discarded only old SSTables, not the new SSTable.
> That's because ReplayPosition which was used for discarding SSTable was
> that of the first flush.
>
> I copied some parts of log as below.
> "##" line is my comment.
> The point is that the ReplayPosition is moved forward by the second flush.
> It means some writes are executed after the first flush.
>
> == log ==
> ## started truncation
> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> ColumnFamilyStore.java:2790 - truncating testtbl
> ## the first flush started before truncation
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%)
> on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 -
> Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops,
> 0%/0% of on/off-heap limit)
> ...
> DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 -
> Completed flushing
> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db
> (17.651KiB) for commitlog position ReplayPosition(segmentId=1479371760395,
> position=315867)
> ## this ReplayPosition was used for discarding SSTables
> ...
> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 -
> discard completed log segments for ReplayPosition(segmentId=1479371760395,
> position=315867), table 562848f0-a556-11e6-8b14-51065d58fdfb
> ## end of the first flush
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
> ColumnFamilyStore.java:2823 - Discarding sstable data for truncated CF +
> indexes
> ## the second flush before snapshot
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 698880 (0%)
> on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:2] 2016-11-17 08:36:05,029 Memtable.java:352 -
> Writing Memtable-testtbl@1186728207(50.375KiB serialized bytes, 3328 ops,
> 0%/0% of on/off-heap limit)
> ...
> DEBUG [MemtableFlushWriter:2] 

Re: generate different sizes of request from single client

2016-11-24 Thread Ben Slater
If targetting two different tables for the different sizes works then I’ve
submitted a patch for cassandra-stress that allows you to do that:
https://issues.apache.org/jira/browse/CASSANDRA-8780

It would be nice to see someone else test it if you have the appetite to
build it and try it out.

Cheers
Ben

On Fri, 25 Nov 2016 at 16:43 Vladimir Yudovin  wrote:

> >doesn't has any option for mixed request size
> As a workaround you can run two parallel tests with its own size each.
>
> Best regards, Vladimir Yudovin,
> *Winguzone  - Cloud Cassandra Hosting*
>
>
>  On Thu, 24 Nov 2016 16:54:08 -0500 *Vikas Jaiman
> >* wrote 
>
> Hi Vladimir,
>
> It has option of mixed request of read/write but doesn't has any option
> for mixed request size where I can request different size of request.
>
> Thanks,
> Vikas
>
> On Thu, Nov 24, 2016 at 7:46 PM, Vladimir Yudovin 
> wrote:
>
>
> You can use cassandra stress-tool.
> 
> It has options to set different load patterns.
>
> Best regards, Vladimir Yudovin,
> *Winguzone  - Cloud Cassandra Hosting*
>
>
>  On Thu, 24 Nov 2016 13:27:59 -0500*Vikas Jaiman
> >* wrote 
>
> Hi all,
> I want to generate two different sizes (let's say 1 KB and 10 KB) of
> request from single client for benchmarking Cassandra. Is there any tool
> exist for this type of scenario?
>
> Vikas
>
>
>


Re: failure node rejoin

2016-11-23 Thread Ben Slater
You could certainly log a JIRA for the “failure node rejoin” issue (
https://issues.apache.org/*jira*/browse/
<https://issues.apache.org/jira/browse/>*cassandra
<https://issues.apache.org/jira/browse/cassandra>). I*t sounds like
unexpected behaviour to me. However, I’m not sure it will be viewed a high
priority to fix given there is a clear operational work-around.

Cheers
Ben

On Thu, 24 Nov 2016 at 15:14 Yuji Ito <y...@imagine-orb.com> wrote:

> Hi Ben,
>
> I continue to investigate the data loss issue.
> I'm investigating logs and source code and try to reproduce the data loss
> issue with a simple test.
> I also try my destructive test with DROP instead of TRUNCATE.
>
> BTW, I want to discuss the issue of the title "failure node rejoin" again.
>
> Will this issue be fixed? Other nodes should refuse this unexpected rejoin.
> Or should I be more careful to add failure nodes to the existing cluster?
>
> Thanks,
> yuji
>
>
> On Fri, Nov 11, 2016 at 1:00 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> From a quick look I couldn’t find any defects other than the ones you’ve
> found that seem potentially relevant to your issue (if any one else on the
> list knows of one please chime in). Maybe the next step, if you haven’t
> done so already, is to check your Cassandra logs for any signs of issues
> (ie WARNING or ERROR logs) in the failing case.
>
> Cheers
> Ben
>
> On Fri, 11 Nov 2016 at 13:07 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> I tried 2.2.8 and could reproduce the problem.
> So, I'm investigating some bug fixes of repair and commitlog between 2.2.8
> and 3.0.9.
>
> - CASSANDRA-12508: "nodetool repair returns status code 0 for some errors"
>
> - CASSANDRA-12436: "Under some races commit log may incorrectly think it
> has unflushed data"
>   - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is
> different from that of 3.0?)
>
> Do you know other bug fixes related to commitlog?
>
> Regards
> yuji
>
> On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> There have been a few commit log bugs around in the last couple of months
> so perhaps you’ve hit something that was fixed recently. Would be
> interesting to know the problem is still occurring in 2.2.8.
>
> I suspect what is happening is that when you do your initial read (without
> flush) to check the number of rows, the data is in memtables and
> theoretically the commitlogs but not sstables. With the forced stop the
> memtables are lost and Cassandra should read the commitlog from disk at
> startup to reconstruct the memtables. However, it looks like that didn’t
> happen for some (bad) reason.
>
> Good news that 3.0.9 fixes the problem so up to you if you want to
> investigate further and see if you can narrow it down to file a JIRA
> (although the first step of that would be trying 2.2.9 to make sure it’s
> not already fixed there).
>
> Cheers
> Ben
>
> On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote:
>
> I tried C* 3.0.9 instead of 2.2.
> The data lost problem hasn't happen for now (without `nodetool flush`).
>
> Thanks
>
> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Definitely sounds to me like something is not working as expected but I
> don’t really have any idea what would cause that (other than the fairly
> extreme failure scenario). A couple of things I can think of to try to
> narrow it down:
> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
> data is written to sstables rather than relying on commit logs
> 2) Run the test with consistency level quorom rather than serial
> (shouldn’t be any different but quorom is more widely used so maybe there
> is a bug that’s specific to serial)
>
> Cheers
> Ben
>
> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi Ben,
>
> The test without killing nodes has been working w

Re: cassandra python driver routing requests to one node?

2016-11-13 Thread Ben Slater
What load balancing policies are you using in your client code (
https://datastax.github.io/python-driver/api/cassandra/policies.html)?

Cheers
Ben

On Mon, 14 Nov 2016 at 16:22 Andrew Bialecki 
wrote:

> We have an odd situation where all of a sudden of our cluster started
> seeing a disproportionate number of writes go to one node. We're using the
> Python driver version 3.7.1. I'm not sure if this is a driver issue or
> possibly a network issue causing requests to get routed in an odd way. It's
> not absolute, there are requests going to all nodes.
>
> Tried restarting the problematic node, no luck (those are the quiet
> periods). Tried restarting the clients, also no luck. Checked nodetool
> status and ownership is even across the cluster.
>
> Curious if anyone's seen this behavior before. Seems like the next step
> will be to debug the client and see why it's choosing that node.
>
> [image: Inline image 1]
>
>
> --
> AB
>


Re: failure node rejoin

2016-11-10 Thread Ben Slater
>From a quick look I couldn’t find any defects other than the ones you’ve
found that seem potentially relevant to your issue (if any one else on the
list knows of one please chime in). Maybe the next step, if you haven’t
done so already, is to check your Cassandra logs for any signs of issues
(ie WARNING or ERROR logs) in the failing case.

Cheers
Ben

On Fri, 11 Nov 2016 at 13:07 Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben,
>
> I tried 2.2.8 and could reproduce the problem.
> So, I'm investigating some bug fixes of repair and commitlog between 2.2.8
> and 3.0.9.
>
> - CASSANDRA-12508: "nodetool repair returns status code 0 for some errors"
>
> - CASSANDRA-12436: "Under some races commit log may incorrectly think it
> has unflushed data"
>   - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is
> different from that of 3.0?)
>
> Do you know other bug fixes related to commitlog?
>
> Regards
> yuji
>
> On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> There have been a few commit log bugs around in the last couple of months
> so perhaps you’ve hit something that was fixed recently. Would be
> interesting to know the problem is still occurring in 2.2.8.
>
> I suspect what is happening is that when you do your initial read (without
> flush) to check the number of rows, the data is in memtables and
> theoretically the commitlogs but not sstables. With the forced stop the
> memtables are lost and Cassandra should read the commitlog from disk at
> startup to reconstruct the memtables. However, it looks like that didn’t
> happen for some (bad) reason.
>
> Good news that 3.0.9 fixes the problem so up to you if you want to
> investigate further and see if you can narrow it down to file a JIRA
> (although the first step of that would be trying 2.2.9 to make sure it’s
> not already fixed there).
>
> Cheers
> Ben
>
> On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote:
>
> I tried C* 3.0.9 instead of 2.2.
> The data lost problem hasn't happen for now (without `nodetool flush`).
>
> Thanks
>
> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Definitely sounds to me like something is not working as expected but I
> don’t really have any idea what would cause that (other than the fairly
> extreme failure scenario). A couple of things I can think of to try to
> narrow it down:
> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
> data is written to sstables rather than relying on commit logs
> 2) Run the test with consistency level quorom rather than serial
> (shouldn’t be any different but quorom is more widely used so maybe there
> is a bug that’s specific to serial)
>
> Cheers
> Ben
>
> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi Ben,
>
> The test without killing nodes has been working well without data lost.
> I've repeated my test about 200 times after removing data and
> rebuild/repair.
>
> Regards,
>
>
> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Just to confirm, are you saying:
> > a) after operation 2, you select all and get 1000 rows
> > b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> That's right!
>
> I've started the test without killing nodes.
> I'll report the result to you next Monday.
>
> Thanks
>
>
> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Just to confirm, are you saying:
> a) after operation 2, you select all and get 1000 rows
> b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> If so, that would be very unexpected. If you run your tests without
> killing nodes do you get the expected (1,000) rows?
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Are you cer

Re: failure node rejoin

2016-11-08 Thread Ben Slater
There have been a few commit log bugs around in the last couple of months
so perhaps you’ve hit something that was fixed recently. Would be
interesting to know the problem is still occurring in 2.2.8.

I suspect what is happening is that when you do your initial read (without
flush) to check the number of rows, the data is in memtables and
theoretically the commitlogs but not sstables. With the forced stop the
memtables are lost and Cassandra should read the commitlog from disk at
startup to reconstruct the memtables. However, it looks like that didn’t
happen for some (bad) reason.

Good news that 3.0.9 fixes the problem so up to you if you want to
investigate further and see if you can narrow it down to file a JIRA
(although the first step of that would be trying 2.2.9 to make sure it’s
not already fixed there).

Cheers
Ben

On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote:

> I tried C* 3.0.9 instead of 2.2.
> The data lost problem hasn't happen for now (without `nodetool flush`).
>
> Thanks
>
> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Definitely sounds to me like something is not working as expected but I
> don’t really have any idea what would cause that (other than the fairly
> extreme failure scenario). A couple of things I can think of to try to
> narrow it down:
> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
> data is written to sstables rather than relying on commit logs
> 2) Run the test with consistency level quorom rather than serial
> (shouldn’t be any different but quorom is more widely used so maybe there
> is a bug that’s specific to serial)
>
> Cheers
> Ben
>
> On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi Ben,
>
> The test without killing nodes has been working well without data lost.
> I've repeated my test about 200 times after removing data and
> rebuild/repair.
>
> Regards,
>
>
> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Just to confirm, are you saying:
> > a) after operation 2, you select all and get 1000 rows
> > b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> That's right!
>
> I've started the test without killing nodes.
> I'll report the result to you next Monday.
>
> Thanks
>
>
> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Just to confirm, are you saying:
> a) after operation 2, you select all and get 1000 rows
> b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> If so, that would be very unexpected. If you run your tests without
> killing nodes do you get the expected (1,000) rows?
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Are you certain your tests don’t generate any overlapping inserts (by
> PK)?
>
> Yes. The operation 2) also checks the number of rows just after all
> insertions.
>
>
> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK. Are you certain your tests don’t generate any overlapping inserts (by
> PK)? Cassandra basically treats any inserts with the same primary key as
> updates (so 1000 insert operations may not necessarily result in 1000 rows
> in the DB).
>
> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:
>
> thanks Ben,
>
> > 1) At what stage did you have (or expect to have) 1000 rows (and have
> the mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
>
> after operation 3), at operation 4) which reads all rows by cqlsh with
> CL.SERIAL
>
> > 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
> - create keyspace testkeyspace WITH REPLICATION =
> {'class':'SimpleStrategy','replication_factor':3};
> - consistency l

Re: large number of pending compactions, sstables steadily increasing

2016-11-07 Thread Ben Slater
What I’ve seen happen a number of times is you get in a negative feedback
loop:
not enough capacity to keep up with compactions (often triggered by repair
or compaction hitting a large partition) -> more sstables -> more expensive
reads -> even less capacity to keep up with compactions -> repeat

The way we deal with this at Instaclustr is typically to take the node
offline to let it catch up with compactions. We take it offline by running
nodetool disablegossip + disablethrift + disablebinary, unthrottle
compactions (nodetool setcompactionthroughput 0) and then leave it to chug
through compactions until it gets close to zero then reverse the settings
or restart C* to set things back to normal. This typically resolves the
issues. If you see it happening regularly your cluster probably needs more
processing capacity (or other tuning).

Cheers
Ben

On Tue, 8 Nov 2016 at 02:38 Eiti Kimura  wrote:

> Hey guys,
>
> Do we have any conclusions about this case? Ezra, did you solve your
> problem?
> We are facing a very similar problem here. LeveledCompaction with VNodes
> and looks like a node went to a weird state and start to consume lot of
> CPU, the compaction process seems to be stucked and the number of SSTables
> increased significantly.
>
> Do you have any clue about it?
>
> Thanks,
> Eiti
>
>
>
> J.P. Eiti Kimura
> Plataformas
>
> +55 19 3518  5500
> + 55 19 98232 2792
> skype: eitikimura
> 
>   
> 
>
> 2016-09-11 18:20 GMT-03:00 Jens Rantil :
>
> I just want to chime in and say that we also had issues keeping up with
> compaction once (with vnodes/ssd disks) and I also want to recommend
> keeping track of your open file limit which might bite you.
>
> Cheers,
> Jens
>
>
> On Friday, August 19, 2016, Mark Rose  wrote:
>
> Hi Ezra,
>
> Are you making frequent changes to your rows (including TTL'ed
> values), or mostly inserting new ones? If you're only inserting new
> data, it's probable using size-tiered compaction would work better for
> you. If you are TTL'ing whole rows, consider date-tiered.
>
> If leveled compaction is still the best strategy, one way to catch up
> with compactions is to have less data per partition -- in other words,
> use more machines. Leveled compaction is CPU expensive. You are CPU
> bottlenecked currently, or from the other perspective, you have too
> much data per node for leveled compaction.
>
> At this point, compaction is so far behind that you'll likely be
> getting high latency if you're reading old rows (since dozens to
> hundreds of uncompacted sstables will likely need to be checked for
> matching rows). You may be better off with size tiered compaction,
> even if it will mean always reading several sstables per read (higher
> latency than when leveled can keep up).
>
> How much data do you have per node? Do you update/insert to/delete
> rows? Do you TTL?
>
> Cheers,
> Mark
>
> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel 
> wrote:
> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
> fix
> > issue) which seems to be stuck in a weird state -- with a large number of
> > pending compactions and sstables. The node is compacting about 500gb/day,
> > number of pending compactions is going up at about 50/day. It is at about
> > 2300 pending compactions now. I have tried increasing number of
> compaction
> > threads and the compaction throughput, which doesn't seem to help
> eliminate
> > the many pending compactions.
> >
> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
> latter
> > has fixed the issue in the past, but most recently I was getting OOM
> errors,
> > probably due to the large number of sstables. I upgraded to 2.2.7 and am
> no
> > longer getting OOM errors, but also it does not resolve the issue. I do
> see
> > this message in the logs:
> >
> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
> >> CompactionManager.java:610 - Cannot perform a full major compaction as
> >> repaired and unrepaired sstables cannot be compacted together. These
> two set
> >> of sstables will be compacted separately.
> >
> > Below are the 'nodetool tablestats' comparing a normal and the
> problematic
> > node. You can see problematic node has many many more sstables, and they
> are
> > all in level 1. What is the best way to fix this? Can I just delete those
> > sstables somehow then run a repair?
> >>
> >> Normal node
> >>>
> >>> keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 31905656
> >>>
> >>> Write Latency: 0.051713177939359714 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> 

Re: Is it a memory issue?

2016-11-06 Thread Ben Slater
Yes, it does mean you’re getting ahead of Cassandra’s ability to keep up
although I would have probably expected a higher number of pending
compactions before you got serious issues (I’ve seen numbers in the
thousands).

I notice from the screenshot you provide that you are using secondary
indexes. There are a lot of way to missuse secondary indexes (vs not very
many way to use them well). I think it’s possible that what you are seeing
is the result of the secondary index on event time (I assume a very high
cardinality column). This is a good blog on secondary indexes:
http://www.wentnet.com/blog/?p=77

Cheers
Ben

On Mon, 7 Nov 2016 at 16:29 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>
wrote:

> Thanks Ben. I stopped inserting and checked compaction status as you
> mentioned. Seems there is lots of compaction work waiting to do. Please see
> below. In this case is it a sign that writting faster than C* can process?
>
> One node,
> [root@iZbp11zpafrqfsiys90kzoZ bin]# ./nodetool compactionstats
> pending tasks: 195
>
>  id   compaction type   keyspace  
>   tablecompleted totalunit   progress
>
>5da60b10-a4a9-11e6-88e9-755b5673a02aCompaction cargts   
> eventdata.eventdata_event_time_idx   1699866872   26536427792   bytes  
> 6.41%
>
>Compaction system  
>   hints 103543795172210360   bytes  0.20%
> Active compaction remaining time :   0h29m48s
>
> Another node,
> [root@iZbp1iqnrpsdhoodwii32bZ bin]# ./nodetool compactionstats
> pending tasks: 84
>
>  id   compaction type   keyspace  
>   table completed totalunit   progress
>
>28a9d010-a4a7-11e6-b985-979fea8d6099Compaction cargts  
>   eventdata 6561414001424412420   bytes 46.06%
>
>7c034840-a48e-11e6-b985-979fea8d6099Compaction cargts   
> eventdata.eventdata_event_time_idx   32098562606   42616107664   bytes 
> 75.32%
> Active compaction remaining time :   0h11m12s
>
>
> *From:* Ben Slater <ben.sla...@instaclustr.com>
> *Date:* 2016-11-07 11:41
> *To:* user <user@cassandra.apache.org>
> *Subject:* Re: Is it a memory issue?
>
> This sounds to me like your writes go ahead of compactions trying to keep
> up which can eventually cause issues. Keep an eye on nodetool
> compactionstats if the number of compactions continually climbs then you
> are writing faster than Cassandra can actually process. If this is
> happening then you need to either add more processing capacity (nodes) to
> your cluster or throttle writes on the client side.
>
> It could also be related to conditions like an individual partition
> growing too big but I’d check for backed up compactions first.
>
> Cheers
> Ben
>
> On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>
> wrote:
>
> Hi All,
> We have one issue on C* testing. At first the inserting was very fast and
> TPS was about 30K/s, but when the size of data rows reached 2 billion, the
> insertion rate decreased very badly and the TPS was 20K/s. When the size of
> rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout
> come out. At last OOM issue happened in some nodes and C* deamon in some
> nodes crashed.  In production we have about 8 billion rows. My testing
> cluster setting is as below. My question is if the memory is the main
> issue. Do I need increase the memory, and what's the right setting for 
> MAX_HEAP_SIZE
> and HEAP_NEWSIZE?
>
> My cluster setting:
> C* cluster with 3 nodes in Aliyun Cloud
> CPU: 4core
> Memory: 8G
> Disk: 500G
> MAX_HEAP_SIZE=2G
> HEAP_NEWSIZE=500M
>
> My table schema:
>
> CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 
> 'SimpleStrategy','replication_factor':2};
> use cargts;
> CREATE TABLE eventdata (
> deviceId int,
> date int,
> event_time bigint,
> lat decimal,
> lon decimal,
> speed int,
> heading int,
> PRIMARY KEY ((deviceId,date),event_time)
> )
> WITH CLUSTERING ORDER BY (event_time ASC);
> CREATE INDEX ON eventdata (event_time);
>
> Best Regards,
> -Simon Wu
>
>


Re: Is it a memory issue?

2016-11-06 Thread Ben Slater
This sounds to me like your writes go ahead of compactions trying to keep
up which can eventually cause issues. Keep an eye on nodetool
compactionstats if the number of compactions continually climbs then you
are writing faster than Cassandra can actually process. If this is
happening then you need to either add more processing capacity (nodes) to
your cluster or throttle writes on the client side.

It could also be related to conditions like an individual partition growing
too big but I’d check for backed up compactions first.

Cheers
Ben

On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com 
wrote:

> Hi All,
> We have one issue on C* testing. At first the inserting was very fast and
> TPS was about 30K/s, but when the size of data rows reached 2 billion, the
> insertion rate decreased very badly and the TPS was 20K/s. When the size of
> rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout
> come out. At last OOM issue happened in some nodes and C* deamon in some
> nodes crashed.  In production we have about 8 billion rows. My testing
> cluster setting is as below. My question is if the memory is the main
> issue. Do I need increase the memory, and what's the right setting for 
> MAX_HEAP_SIZE
> and HEAP_NEWSIZE?
>
> My cluster setting:
> C* cluster with 3 nodes in Aliyun Cloud
> CPU: 4core
> Memory: 8G
> Disk: 500G
> MAX_HEAP_SIZE=2G
> HEAP_NEWSIZE=500M
>
> My table schema:
>
> CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 
> 'SimpleStrategy','replication_factor':2};
> use cargts;
> CREATE TABLE eventdata (
> deviceId int,
> date int,
> event_time bigint,
> lat decimal,
> lon decimal,
> speed int,
> heading int,
> PRIMARY KEY ((deviceId,date),event_time)
> )
> WITH CLUSTERING ORDER BY (event_time ASC);
> CREATE INDEX ON eventdata (event_time);
>
> Best Regards,
> -Simon Wu
>
>


Re: Commercial Support Providers?

2016-11-03 Thread Ben Slater
I can confirm that we do offer support contracts for OSS Apache Cassandra
at Instaclustr (in addition to our managed service) - either drop me an
email direct (signature below) or contact sa...@instaclustr.com and would
be happy to discuss details.

Cheers
Ben

On Fri, 4 Nov 2016 at 14:02 Max C  wrote:

> Hello -
>
> We’re rolling out a small cluster at my work (2 DCs of 3 nodes each —
> hosted on-premises), and my boss has asked us to look into commercial
> support offerings.
>
> The main thing we’re looking for is a company that we can call day or
> night if/when things go “kaboom” and I can’t figure out what the problem is
> (ex: an upgrade fails unexpectedly, weird error messages in the logs,
> repairs keep failing, etc).  If they offer their own tested, supported,
> patched, version of Cassandra that would be ideal, and certainly management
> tools like OpsCenter are a bonus.
>
> The obvious choice here is DataStax, and we’re definitely talking to
> them.  Are there any other providers which offer this sort of service?
> Maybe Instaclustr?
>
> Thanks.
>
> - Max


Re: Lightweight transaction inside a batch : request rejected

2016-10-24 Thread Ben Slater
Yep, you would have to select the whole map and then pull out the
particular value you want in your application. I didn’t actually realise
you couldn’t just specify a particular map element in a select like you can
in an update but it appears there is a long running Jira for this:
https://issues.apache.org/jira/browse/CASSANDRA-7396

Cheers
Ben

On Tue, 25 Oct 2016 at 16:25 Mickael Delanoë <delanoe...@gmail.com> wrote:

> I can't do this, otherwhise i won't be able to query the item_id using a
> key with a query like :
> Select * from item_id_by_key where user_id=... and key=
>
> Le 25 oct. 2016 07:15, "Ben Slater" <ben.sla...@instaclustr.com> a écrit :
>
> Move item_id_by_key into a collection field in item table? (Would
> probably be a “better” C* data model anyway.)
>
> On Tue, 25 Oct 2016 at 16:08 Mickael Delanoë <delanoe...@gmail.com> wrote:
>
> Ok, I understand, thanks.
> So now i would like to know if there is some best practices to do what i
> want.
> I.e inserting entries in several tables (with same partition key) only if
> there is not already an entry in the main table.
>
> Keep in mind i wanted to do that inside a single batch because I can have
> 2 concurrent request trying to insert something different but with the same
> primary key in the main table.
>
> If i split the batch in 2 requests(1 with the LWT, 1 with the rest), how
> can i ensure the last batch won't override the previous data and that the
> whole data will be saved (in case of a problem between request1 and
> request2) ?
>
> Le 24 oct. 2016 12:47, "DuyHai Doan" <doanduy...@gmail.com> a écrit :
>
>
>
> "So I guess in that case the Paxos operation does not span multiple table
> but operates only the table that has the condition. Am I wrong?"
>
> --> The fact that you're using a BATCH with LWT means that either ALL
> statements succeed or NONE. And to guarantee this, Paxos ballot must cover
> all statements. In your case since they span on multiple tables it's not
> possible
>
> On Mon, Oct 24, 2016 at 11:34 AM, Mickael Delanoë <delanoe...@gmail.com>
> wrote:
>
> Thanks DuyHai for the info.
> I already see this JIRA, however the use case I describe is slightly
> different from the JIRA as there is only ONE condition on ONE table. Other
> statements of the batch does not have any condition.
> So I guess in that case the Paxos operation does not span multiple table
> but operates only the table that has the condition. Am I wrong?
>
>
>
> 2016-10-24 10:21 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>:
>
> As far as I remember, there is an optimization in Cassandra to manage
> Paxos ballot per table. So asking a Paxos operation to span multiple tables
> (even if same partition key) would require a lot of changes in the current
> impl.
>
> The question has already been raised, you may want to convince the
> committers by adding some comments here:
> https://issues.apache.org/jira/browse/CASSANDRA-10085
>
> On Mon, Oct 24, 2016 at 9:58 AM, Mickael Delanoë <delanoe...@gmail.com>
> wrote:
>
> Hi,
>
> I would like to use lightweight transaction inside a batch but the request
> is rejected by cassandra, however I think this is a use case than could be
> handled without problem.
> Below is what I wanted to do.
>
> I am using cassandra 3.7.
>
> CREATE KEYSPACE test_ksp WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '1'};
>
> CREATE TABLE test_ksp.item (
> user_id bigint,
> item_id text,
> item_value text,
> item_key1 text,
> item_key2 text,
> PRIMARY KEY ((user_id), item_id));
>
> CREATE TABLE test_ksp.item_id_by_key (
> user_id bigint,
> item_key text,
> item_id text,
> PRIMARY KEY ((user_id), item_key));
>
> USE test_ksp;
>
> BEGIN BATCH
> INSERT INTO item (user_id, item_id, item_value, item_key1, item_key2)
> values (1,'i11','item-C', 'key-XYZ-123', 'key-ABC-789') IF NOT EXISTS;
> INSERT INTO item_id_by_key (user_id, item_key, item_id) VALUES (1,
> 'key-XYZ-123', 'i11');
> INSERT INTO item_id_by_key (user_id, item_key, item_id) VALUES (1,
> 'key-ABC-789', 'i11');
> APPLY BATCH;
>
>
> So as you can see this is a batch that targets 2 tables but with the same
> partition key (i.e the same target nodes). Moreover It uses only ONE
> condition on one table only.
> I don't understand why cassandra returns an error "Batch with conditions
> cannot span multiple tables" in that case.
>
> I understand that if I had used several conditions on different tables it
> could be a problem, but in my case there is only one condition and moreover
> I have always the same partition key for e

Re: Lightweight transaction inside a batch : request rejected

2016-10-24 Thread Ben Slater
Move item_id_by_key into a collection field in item table? (Would probably
be a “better” C* data model anyway.)

On Tue, 25 Oct 2016 at 16:08 Mickael Delanoë  wrote:

> Ok, I understand, thanks.
> So now i would like to know if there is some best practices to do what i
> want.
> I.e inserting entries in several tables (with same partition key) only if
> there is not already an entry in the main table.
>
> Keep in mind i wanted to do that inside a single batch because I can have
> 2 concurrent request trying to insert something different but with the same
> primary key in the main table.
>
> If i split the batch in 2 requests(1 with the LWT, 1 with the rest), how
> can i ensure the last batch won't override the previous data and that the
> whole data will be saved (in case of a problem between request1 and
> request2) ?
>
> Le 24 oct. 2016 12:47, "DuyHai Doan"  a écrit :
>
>
>
> "So I guess in that case the Paxos operation does not span multiple table
> but operates only the table that has the condition. Am I wrong?"
>
> --> The fact that you're using a BATCH with LWT means that either ALL
> statements succeed or NONE. And to guarantee this, Paxos ballot must cover
> all statements. In your case since they span on multiple tables it's not
> possible
>
> On Mon, Oct 24, 2016 at 11:34 AM, Mickael Delanoë 
> wrote:
>
> Thanks DuyHai for the info.
> I already see this JIRA, however the use case I describe is slightly
> different from the JIRA as there is only ONE condition on ONE table. Other
> statements of the batch does not have any condition.
> So I guess in that case the Paxos operation does not span multiple table
> but operates only the table that has the condition. Am I wrong?
>
>
>
> 2016-10-24 10:21 GMT+02:00 DuyHai Doan :
>
> As far as I remember, there is an optimization in Cassandra to manage
> Paxos ballot per table. So asking a Paxos operation to span multiple tables
> (even if same partition key) would require a lot of changes in the current
> impl.
>
> The question has already been raised, you may want to convince the
> committers by adding some comments here:
> https://issues.apache.org/jira/browse/CASSANDRA-10085
>
> On Mon, Oct 24, 2016 at 9:58 AM, Mickael Delanoë 
> wrote:
>
> Hi,
>
> I would like to use lightweight transaction inside a batch but the request
> is rejected by cassandra, however I think this is a use case than could be
> handled without problem.
> Below is what I wanted to do.
>
> I am using cassandra 3.7.
>
> CREATE KEYSPACE test_ksp WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '1'};
>
> CREATE TABLE test_ksp.item (
> user_id bigint,
> item_id text,
> item_value text,
> item_key1 text,
> item_key2 text,
> PRIMARY KEY ((user_id), item_id));
>
> CREATE TABLE test_ksp.item_id_by_key (
> user_id bigint,
> item_key text,
> item_id text,
> PRIMARY KEY ((user_id), item_key));
>
> USE test_ksp;
>
> BEGIN BATCH
> INSERT INTO item (user_id, item_id, item_value, item_key1, item_key2)
> values (1,'i11','item-C', 'key-XYZ-123', 'key-ABC-789') IF NOT EXISTS;
> INSERT INTO item_id_by_key (user_id, item_key, item_id) VALUES (1,
> 'key-XYZ-123', 'i11');
> INSERT INTO item_id_by_key (user_id, item_key, item_id) VALUES (1,
> 'key-ABC-789', 'i11');
> APPLY BATCH;
>
>
> So as you can see this is a batch that targets 2 tables but with the same
> partition key (i.e the same target nodes). Moreover It uses only ONE
> condition on one table only.
> I don't understand why cassandra returns an error "Batch with conditions
> cannot span multiple tables" in that case.
>
> I understand that if I had used several conditions on different tables it
> could be a problem, but in my case there is only one condition and moreover
> I have always the same partition key for every table inside the batch.
> As there is only one condition, I expected the paxos protocol just act on
> this condition and as the partition keys are all the same, the paxos
> protocol has only to work with the same replica nodes (not span across
> multiple partition).
> In my point of view this is as if the LWT was in a single statement,
> except that after the LWT is accepted a complete batch has to be executed.
>
> Is there someone that could explain why this use case need to be rejected
> by cassandra? And do you think this is something that cassandra could
> handle in a future version ?
>
> Regards,
> Mickaël
>
>
>
>
>
> --
> Mickaël Delanoë
>
>
>
>


Re: failure node rejoin

2016-10-23 Thread Ben Slater
Definitely sounds to me like something is not working as expected but I
don’t really have any idea what would cause that (other than the fairly
extreme failure scenario). A couple of things I can think of to try to
narrow it down:
1) Run nodetool flush on all nodes after step 2 - that will make sure all
data is written to sstables rather than relying on commit logs
2) Run the test with consistency level quorom rather than serial (shouldn’t
be any different but quorom is more widely used so maybe there is a bug
that’s specific to serial)

Cheers
Ben

On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote:

> Hi Ben,
>
> The test without killing nodes has been working well without data lost.
> I've repeated my test about 200 times after removing data and
> rebuild/repair.
>
> Regards,
>
>
> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Just to confirm, are you saying:
> > a) after operation 2, you select all and get 1000 rows
> > b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> That's right!
>
> I've started the test without killing nodes.
> I'll report the result to you next Monday.
>
> Thanks
>
>
> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> Just to confirm, are you saying:
> a) after operation 2, you select all and get 1000 rows
> b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> If so, that would be very unexpected. If you run your tests without
> killing nodes do you get the expected (1,000) rows?
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:
>
> > Are you certain your tests don’t generate any overlapping inserts (by
> PK)?
>
> Yes. The operation 2) also checks the number of rows just after all
> insertions.
>
>
> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK. Are you certain your tests don’t generate any overlapping inserts (by
> PK)? Cassandra basically treats any inserts with the same primary key as
> updates (so 1000 insert operations may not necessarily result in 1000 rows
> in the DB).
>
> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:
>
> thanks Ben,
>
> > 1) At what stage did you have (or expect to have) 1000 rows (and have
> the mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
>
> after operation 3), at operation 4) which reads all rows by cqlsh with
> CL.SERIAL
>
> > 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
> - create keyspace testkeyspace WITH REPLICATION =
> {'class':'SimpleStrategy','replication_factor':3};
> - consistency level is SERIAL
>
>
> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>
> A couple of questions:
> 1) At what stage did you have (or expect to have) 1000 rows (and have the
> mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
> 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> I tried to run a rebuild and repair after the failure node rejoined the
> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
> The failure node could rejoined and I could read all rows successfully.
> (Sometimes a repair failed because the node cannot access other node. If
> it failed, I retried a repair)
>
> But some rows were lost after my destructive test repeated (after about
> 5-6 hours).
> After the test inserted 1000 rows, there were only 953 rows at the end of
> the test.
>
> My destructive test:
> - each C* node is killed & restarted at the random interval (within about
> 5 min) throughout this test
> 1) truncate all tables
> 2) insert initial rows (check if all rows are inserted successfully)
> 3) request a lot of read/write to random rows for about 30min
> 4) check all rows
> If operation 1), 2) or 4) fail due to C* failure, the test retry the
> operation.
>
> Does anyone have the similar problem?
> What causes data lost?
> Does the test need any operation when C* node is restarted? (Currently, I
> just restarted C* process)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote

Re: Hadoop vs Cassandra

2016-10-23 Thread Ben Slater
It’s reasonably common to use Cassandra to cover both online and analytics
requirements, particularly using it in conjunction with Spark. You can use
Cassandra’s multi-DC functionality to have online and analytics DCs for a
reasonable degree of workload separation without having to build ETL (or
some other replication) to get data between two environments.

On Sun, 23 Oct 2016 at 20:00 Ali Akhtar  wrote:

> By Hadoop do you mean HDFS?
>
>
>
> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan  wrote:
>
> Hi All,
>
> I read the following comparison between hadoop and cassandra. Seems the
> conclusion that we use hadoop for data lake ( cold data ) and Cassandra for
> hot data (real time data).
>
> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>
> My question is, can we just use cassandra to rule them all ?
>
> What we are trying to achieve is to minimize the moving part on our
> system.
>
> Any response would be really appreciated.
>
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>
>
>


Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ben Slater
http://cassandra.apache.org/doc/latest/cql/types.html?highlight=counter#counters

On Sun, 23 Oct 2016 at 19:15 Kant Kodali  wrote:

> where does it say counter is implemented as long?
>
> On Sun, Oct 23, 2016 at 1:13 AM, Ali Akhtar  wrote:
>
> Probably:
> https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#MAX_VALUE
>
> On Sun, Oct 23, 2016 at 1:12 PM, Kant Kodali  wrote:
>
> What is the maximum value of Cassandra Counter Column?
>
>
>
>


Re: failure node rejoin

2016-10-21 Thread Ben Slater
Just to confirm, are you saying:
a) after operation 2, you select all and get 1000 rows
b) after operation 3 (which only does updates and read) you select and only
get 953 rows?

If so, that would be very unexpected. If you run your tests without killing
nodes do you get the expected (1,000) rows?

Cheers
Ben

On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote:

> > Are you certain your tests don’t generate any overlapping inserts (by
> PK)?
>
> Yes. The operation 2) also checks the number of rows just after all
> insertions.
>
>
> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK. Are you certain your tests don’t generate any overlapping inserts (by
> PK)? Cassandra basically treats any inserts with the same primary key as
> updates (so 1000 insert operations may not necessarily result in 1000 rows
> in the DB).
>
> On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:
>
> thanks Ben,
>
> > 1) At what stage did you have (or expect to have) 1000 rows (and have
> the mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
>
> after operation 3), at operation 4) which reads all rows by cqlsh with
> CL.SERIAL
>
> > 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
> - create keyspace testkeyspace WITH REPLICATION =
> {'class':'SimpleStrategy','replication_factor':3};
> - consistency level is SERIAL
>
>
> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>
> A couple of questions:
> 1) At what stage did you have (or expect to have) 1000 rows (and have the
> mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
> 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> I tried to run a rebuild and repair after the failure node rejoined the
> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
> The failure node could rejoined and I could read all rows successfully.
> (Sometimes a repair failed because the node cannot access other node. If
> it failed, I retried a repair)
>
> But some rows were lost after my destructive test repeated (after about
> 5-6 hours).
> After the test inserted 1000 rows, there were only 953 rows at the end of
> the test.
>
> My destructive test:
> - each C* node is killed & restarted at the random interval (within about
> 5 min) throughout this test
> 1) truncate all tables
> 2) insert initial rows (check if all rows are inserted successfully)
> 3) request a lot of read/write to random rows for about 30min
> 4) check all rows
> If operation 1), 2) or 4) fail due to C* failure, the test retry the
> operation.
>
> Does anyone have the similar problem?
> What causes data lost?
> Does the test need any operation when C* node is restarted? (Currently, I
> just restarted C* process)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, that’s a bit more unexpected (to me at least) but I think the solution
> of running a rebuild or repair still applies.
>
> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, sorry - I think understand what you are asking now.
>
> However, I’m still a little confused by your description. I think your
> scenario is:
> 1) Stop C* on all nodes in 

Re: failure node rejoin

2016-10-20 Thread Ben Slater
OK. Are you certain your tests don’t generate any overlapping inserts (by
PK)? Cassandra basically treats any inserts with the same primary key as
updates (so 1000 insert operations may not necessarily result in 1000 rows
in the DB).

On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote:

> thanks Ben,
>
> > 1) At what stage did you have (or expect to have) 1000 rows (and have
> the mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
>
> after operation 3), at operation 4) which reads all rows by cqlsh with
> CL.SERIAL
>
> > 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
> - create keyspace testkeyspace WITH REPLICATION =
> {'class':'SimpleStrategy','replication_factor':3};
> - consistency level is SERIAL
>
>
> On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>
> A couple of questions:
> 1) At what stage did you have (or expect to have) 1000 rows (and have the
> mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
> 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben,
>
> I tried to run a rebuild and repair after the failure node rejoined the
> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
> The failure node could rejoined and I could read all rows successfully.
> (Sometimes a repair failed because the node cannot access other node. If
> it failed, I retried a repair)
>
> But some rows were lost after my destructive test repeated (after about
> 5-6 hours).
> After the test inserted 1000 rows, there were only 953 rows at the end of
> the test.
>
> My destructive test:
> - each C* node is killed & restarted at the random interval (within about
> 5 min) throughout this test
> 1) truncate all tables
> 2) insert initial rows (check if all rows are inserted successfully)
> 3) request a lot of read/write to random rows for about 30min
> 4) check all rows
> If operation 1), 2) or 4) fail due to C* failure, the test retry the
> operation.
>
> Does anyone have the similar problem?
> What causes data lost?
> Does the test need any operation when C* node is restarted? (Currently, I
> just restarted C* process)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, that’s a bit more unexpected (to me at least) but I think the solution
> of running a rebuild or repair still applies.
>
> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, sorry - I think understand what you are asking now.
>
> However, I’m still a little confused by your description. I think your
> scenario is:
> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
> 2) Delete all data from Node A
> 3) Restart Node A
> 4) Restart Node B,C
>
> Is this correct?
>
> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A
> starts succesfully as there are no running nodes to tell it via gossip that
> it shouldn’t start up without the “replaces” flag.
>
> I think that right way to recover in this scenario is to run a nodetool
> rebuild on Node A after the other two nodes are running. You could
> theoretically also run a repair (which would be good practice after a weird
> failure scenario like this) but rebuild will probably be quicker given you
> know all the data needs to be re-

Re: failure node rejoin

2016-10-20 Thread Ben Slater
A couple of questions:
1) At what stage did you have (or expect to have) 1000 rows (and have the
mismatch between actual and expected) - at that end of operation (2) or
after operation (3)?
2) What replication factor and replication strategy is used by the test
keyspace? What consistency level is used by your operations?


Cheers
Ben

On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben,
>
> I tried to run a rebuild and repair after the failure node rejoined the
> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
> The failure node could rejoined and I could read all rows successfully.
> (Sometimes a repair failed because the node cannot access other node. If
> it failed, I retried a repair)
>
> But some rows were lost after my destructive test repeated (after about
> 5-6 hours).
> After the test inserted 1000 rows, there were only 953 rows at the end of
> the test.
>
> My destructive test:
> - each C* node is killed & restarted at the random interval (within about
> 5 min) throughout this test
> 1) truncate all tables
> 2) insert initial rows (check if all rows are inserted successfully)
> 3) request a lot of read/write to random rows for about 30min
> 4) check all rows
> If operation 1), 2) or 4) fail due to C* failure, the test retry the
> operation.
>
> Does anyone have the similar problem?
> What causes data lost?
> Does the test need any operation when C* node is restarted? (Currently, I
> just restarted C* process)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, that’s a bit more unexpected (to me at least) but I think the solution
> of running a rebuild or repair still applies.
>
> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, sorry - I think understand what you are asking now.
>
> However, I’m still a little confused by your description. I think your
> scenario is:
> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
> 2) Delete all data from Node A
> 3) Restart Node A
> 4) Restart Node B,C
>
> Is this correct?
>
> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A
> starts succesfully as there are no running nodes to tell it via gossip that
> it shouldn’t start up without the “replaces” flag.
>
> I think that right way to recover in this scenario is to run a nodetool
> rebuild on Node A after the other two nodes are running. You could
> theoretically also run a repair (which would be good practice after a weird
> failure scenario like this) but rebuild will probably be quicker given you
> know all the data needs to be re-streamed.
>
> Cheers
> Ben
>
> On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thank you Ben, Yabin
>
> I understood the rejoin was illegal.
> I expected this rejoin would fail with the exception.
> But I could add the failure node to the cluster without the
> exception after 2) and 3).
> I want to know why the rejoin succeeds. Should the exception happen?
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>
> The exception you run into is expected behavior. This is because as Ben
> pointed out, when you delete everything (including system schemas), C*
> cluster thinks you're bootstrapping a new node. However,  node2's IP is
> still in gossip and this is why you see the exception.
>
> I'm not clear the reasoning why you need to delete C* data directory. That
> is a dangerous action, especially considering that you delete system
> schemas. If in any case the failure node is gone for a while, what you need
> to do is to is remo

Re: failure node rejoin

2016-10-17 Thread Ben Slater
OK, that’s a bit more unexpected (to me at least) but I think the solution
of running a rebuild or repair still applies.

On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Ben, Jeff
>
> Sorry that my explanation confused you.
>
> Only node1 is the seed node.
> Node2 whose C* data is deleted is NOT a seed.
>
> I restarted the failure node(node2) after restarting the seed node(node1).
> The restarting node2 succeeded without the exception.
> (I couldn't restart node2 before restarting node1 as expected.)
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> The unstated "problem" here is that node1 is a seed, which implies
> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
> setup to start without bootstrapping).
>
> That means once the data dir is wiped, it's going to start again without a
> bootstrap, and make a single node cluster or join an existing cluster if
> the seed list is valid
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> OK, sorry - I think understand what you are asking now.
>
> However, I’m still a little confused by your description. I think your
> scenario is:
> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
> 2) Delete all data from Node A
> 3) Restart Node A
> 4) Restart Node B,C
>
> Is this correct?
>
> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A
> starts succesfully as there are no running nodes to tell it via gossip that
> it shouldn’t start up without the “replaces” flag.
>
> I think that right way to recover in this scenario is to run a nodetool
> rebuild on Node A after the other two nodes are running. You could
> theoretically also run a repair (which would be good practice after a weird
> failure scenario like this) but rebuild will probably be quicker given you
> know all the data needs to be re-streamed.
>
> Cheers
> Ben
>
> On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Thank you Ben, Yabin
>
> I understood the rejoin was illegal.
> I expected this rejoin would fail with the exception.
> But I could add the failure node to the cluster without the
> exception after 2) and 3).
> I want to know why the rejoin succeeds. Should the exception happen?
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>
> The exception you run into is expected behavior. This is because as Ben
> pointed out, when you delete everything (including system schemas), C*
> cluster thinks you're bootstrapping a new node. However,  node2's IP is
> still in gossip and this is why you see the exception.
>
> I'm not clear the reasoning why you need to delete C* data directory. That
> is a dangerous action, especially considering that you delete system
> schemas. If in any case the failure node is gone for a while, what you need
> to do is to is remove the node first before doing "rejoin".
>
> Cheers,
>
> Yabin
>
> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> To cassandra, the node where you deleted the files looks like a brand new
> machine. It doesn’t automatically rebuild machines to prevent accidental
> replacement. You need to tell it to build the “new” machines as a
> replacement for the “old” machine with that IP by setting 
> -Dcassandra.replace_address_first_boot=.
> See http://cassandra.apache.org/doc/latest/operating/topo_changes.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE=>
> .
>
> Cheers
> Ben
>
> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  

Re: failure node rejoin

2016-10-17 Thread Ben Slater
OK, sorry - I think understand what you are asking now.

However, I’m still a little confused by your description. I think your
scenario is:
1) Stop C* on all nodes in a cluster (Nodes A,B,C)
2) Delete all data from Node A
3) Restart Node A
4) Restart Node B,C

Is this correct?

If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A
starts succesfully as there are no running nodes to tell it via gossip that
it shouldn’t start up without the “replaces” flag.

I think that right way to recover in this scenario is to run a nodetool
rebuild on Node A after the other two nodes are running. You could
theoretically also run a repair (which would be good practice after a weird
failure scenario like this) but rebuild will probably be quicker given you
know all the data needs to be re-streamed.

Cheers
Ben

On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote:

> Thank you Ben, Yabin
>
> I understood the rejoin was illegal.
> I expected this rejoin would fail with the exception.
> But I could add the failure node to the cluster without the
> exception after 2) and 3).
> I want to know why the rejoin succeeds. Should the exception happen?
>
> Regards,
>
>
> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>
> The exception you run into is expected behavior. This is because as Ben
> pointed out, when you delete everything (including system schemas), C*
> cluster thinks you're bootstrapping a new node. However,  node2's IP is
> still in gossip and this is why you see the exception.
>
> I'm not clear the reasoning why you need to delete C* data directory. That
> is a dangerous action, especially considering that you delete system
> schemas. If in any case the failure node is gone for a while, what you need
> to do is to is remove the node first before doing "rejoin".
>
> Cheers,
>
> Yabin
>
> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
> To cassandra, the node where you deleted the files looks like a brand new
> machine. It doesn’t automatically rebuild machines to prevent accidental
> replacement. You need to tell it to build the “new” machines as a
> replacement for the “old” machine with that IP by setting 
> -Dcassandra.replace_address_first_boot=.
> See http://cassandra.apache.org/doc/latest/operating/topo_changes.html.
>
> Cheers
> Ben
>
> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote:
>
> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> DN  [node3 IP]  ? 256  100.0%
>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
> UN  [node2 IP]  7.76 MB  256  100.0%
>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
> UN  [node1 IP]  416.13 MB  256  100.0%
>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>
> If I restart C* on node 2 when C* on node1 and node3 are running (without
> 2), 3)), a runtime exception happens.
> RuntimeException: "A node with address [node2 IP] already exists,
> cancelling join..."
>
> I'm not sure this causes data lost. All data can be read properly just
> after this rejoin.
> But some rows are lost when I kill C* for destructive tests after
> this rejoin.
>
> Thanks.
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>
>
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: failure node rejoin

2016-10-16 Thread Ben Slater
To cassandra, the node where you deleted the files looks like a brand new
machine. It doesn’t automatically rebuild machines to prevent accidental
replacement. You need to tell it to build the “new” machines as a
replacement for the “old” machine with that IP by setting
-Dcassandra.replace_address_first_boot=.
See http://cassandra.apache.org/doc/latest/operating/topo_changes.html.

Cheers
Ben

On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote:

> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> DN  [node3 IP]  ? 256  100.0%
>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
> UN  [node2 IP]  7.76 MB  256  100.0%
>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
> UN  [node1 IP]  416.13 MB  256  100.0%
>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>
> If I restart C* on node 2 when C* on node1 and node3 are running (without
> 2), 3)), a runtime exception happens.
> RuntimeException: "A node with address [node2 IP] already exists,
> cancelling join..."
>
> I'm not sure this causes data lost. All data can be read properly just
> after this rejoin.
> But some rows are lost when I kill C* for destructive tests after
> this rejoin.
>
> Thanks.
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Re: ask for help about exmples of Data Types the document shows

2016-09-27 Thread Ben Slater
My best guess it that you need to remove the quotes from around the zip
values (ie change if to  zip: 20500 rather than zip: ‘20500’ ) as zip is
defined as an int.

Cheers
Ben

On Wed, 28 Sep 2016 at 14:38 zha...@easemob.com <zha...@easemob.com> wrote:

> Hi, Ben Slater, thank you very much for your replay!
>
> my cassandra version is 3.7, so I think there must be some thing  I
> misunderstand  ahout frozen type. I add a comma between } and ‘work’, the
> result is like below. Is there some special form  about " type frozen"?
>
>
> cqlsh:counterks> INSERT INTO user (name, addresses) VALUES ('z3 Pr3z1den7', 
> {'home' : {street: '1600 Pennsylvania Ave NW',city: 'Washington',zip: 
> '20500',phones: { 'cell' : { country_code: 1, number: '202 456-' 
> },'landline' : { country_code: 1, number: '...' } }},'work' : {street: '1600 
> Pennsylvania Ave NW',city: 'Washington',zip: '20500',phones: { 'fax' : { 
> country_code: 1, number: '...' } }}});
>
> InvalidRequest: code=2200 [Invalid query] message="Invalid map literal for 
> addresses: value {city: 'Washington', zip: '20500', street: '1600 
> Pennsylvania Ave NW', phones: {'cell': {number: '202 456-', country_code: 
> 1}, 'landline': {number: '...', country_code: 1}}} is not of type 
> frozen"
>
>
> my create statements about table and type are:
>
>
> cqlsh:counterks> CREATE TYPE phone ( country_code int, number text, );
>
> cqlsh:counterks> CREATE TYPE address ( street text, city text, 
> zip int, phones map<text, frozen> );
> cqlsh:counterks> CREATE TABLE user (
>  ... name text PRIMARY KEY,
>      ...     addresses map<text, frozen>
>  ... );
>
>
>
> --
> zha...@easemob.com
>
>
> *From:* Ben Slater <ben.sla...@instaclustr.com>
> *Date:* 2016-09-28 11:29
> *To:* user <user@cassandra.apache.org>
> *Subject:* Re: ask for help about exmples of Data Types the document shows
>
> Hi,
>
> I think you are right about the typo in (1). For (2), I think you’re
> missing a comma between } and ‘work’ so the JSON is invalid.
>
> I think reading this JIRA
> https://issues.apache.org/jira/browse/CASSANDRA-7423 that the change
> requiring  a UDT as part of a collection to be explicitly marked as frozen
> is relatively recent (3.6) so the doco may be out date there.
>
> Cheers
> Ben
>
> On Wed, 28 Sep 2016 at 13:12 zha...@easemob.com <zha...@easemob.com>
> wrote:
>
>> hi, everyone, I'm learning Cassandra now , and have some problems about
>> the document of "Data Types" .  I don't know where to report or ask for
>> help, so I'm very sorry if this mail bother you.
>>
>> In the chapter The Cassandra Query Language (CQL)/Data Types (
>> http://cassandra.apache.org/doc/latest/cql/types.html), I'm confused
>> with two examples the document showing below. My enviroment is:
>>
>> CentOS release 6.8 (Final)
>>
>> java version "1.8.0_91"
>> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
>>
>> Python 2.7.11
>>
>> Cassandra version: 3.7
>> CQL version:
>> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
>>
>>
>> 1、 in http://cassandra.apache.org/doc/latest/cql/types.html, when
>> describing type set, the giving example is:
>>
>> CREATE TABLE images (
>> name text PRIMARY KEY,
>> owner text,
>> tags set // A set of text values);INSERT INTO images (name, owner, 
>> tags)
>> VALUES ('cat.jpg', 'jsmith', { 'pet', 'cute' });// Replace the 
>> existing set entirelyUPDATE images SET tags = { 'kitten', 'cat', 'lol' } 
>> WHERE id = 'jsmith';
>>
>> the update cql statement uses "WHERE id = 'jsmith'" while the table
>> images did not define key "id". I think it's just a slip of the pen.
>>
>> 2、 in http://cassandra.apache.org/doc/latest/cql/types.html, when
>> describing “User-Defined Types”,  the giving example is:
>> CREATE TYPE phone (
>>
>> country_code int,
>> number text,)CREATE TYPE address (
>> street text,
>> city text,
>> zip int,
>> phones map<text, phone>)CREATE TABLE user (
>> name text PRIMARY KEY,
>> addresses map<text, frozen>)
>>
>> and when I try to create type address, one error occur:
>>
>> cqlsh:counterks> CREATE TYPE address (
>>  ... street text,
>>  ... city text,
>> 

Re: ask for help about exmples of Data Types the document shows

2016-09-27 Thread Ben Slater
Hi,

I think you are right about the typo in (1). For (2), I think you’re
missing a comma between } and ‘work’ so the JSON is invalid.

I think reading this JIRA
https://issues.apache.org/jira/browse/CASSANDRA-7423 that the change
requiring  a UDT as part of a collection to be explicitly marked as frozen
is relatively recent (3.6) so the doco may be out date there.

Cheers
Ben

On Wed, 28 Sep 2016 at 13:12 zha...@easemob.com <zha...@easemob.com> wrote:

> hi, everyone, I'm learning Cassandra now , and have some problems about
> the document of "Data Types" .  I don't know where to report or ask for
> help, so I'm very sorry if this mail bother you.
>
> In the chapter The Cassandra Query Language (CQL)/Data Types (
> http://cassandra.apache.org/doc/latest/cql/types.html), I'm confused with
> two examples the document showing below. My enviroment is:
>
> CentOS release 6.8 (Final)
>
> java version "1.8.0_91"
> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
>
> Python 2.7.11
>
> Cassandra version: 3.7
> CQL version:
> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
>
>
> 1、 in http://cassandra.apache.org/doc/latest/cql/types.html, when
> describing type set, the giving example is:
>
> CREATE TABLE images (
> name text PRIMARY KEY,
> owner text,
> tags set // A set of text values);INSERT INTO images (name, owner, 
> tags)
> VALUES ('cat.jpg', 'jsmith', { 'pet', 'cute' });// Replace the 
> existing set entirelyUPDATE images SET tags = { 'kitten', 'cat', 'lol' } 
> WHERE id = 'jsmith';
>
> the update cql statement uses "WHERE id = 'jsmith'" while the table
> images did not define key "id". I think it's just a slip of the pen.
>
> 2、 in http://cassandra.apache.org/doc/latest/cql/types.html, when
> describing “User-Defined Types”,  the giving example is:
> CREATE TYPE phone (
>
> country_code int,
> number text,)CREATE TYPE address (
> street text,
> city text,
> zip int,
> phones map<text, phone>)CREATE TABLE user (
> name text PRIMARY KEY,
> addresses map<text, frozen>)
>
> and when I try to create type address, one error occur:
>
> cqlsh:counterks> CREATE TYPE address (
>  ... street text,
>  ... city text,
>  ... zip int,
>  ... phones map<text, phone>
>  ... );
>
> InvalidRequest: code=2200 [Invalid query] message="Non-frozen UDTs are not 
> allowed inside collections: map<text, phone>"
>
> I change the create statement, like:
>
> CREATE TYPE address (
> street text,
> city text,
> zip int,
> phones map<text, frozen>
> );
>
> it works, and the create table user statement works well. Unfortunately,
> when running the insert statement below, error occur:
>
> INSERT INTO user (name, addresses)
>   VALUES ('z3 Pr3z1den7', {
>   'home' : {
>   street: '1600 Pennsylvania Ave NW',
>   city: 'Washington',
>   zip: '20500',
>   phones: { 'cell' : { country_code: 1, number: '202 
> 456-' },
> 'landline' : { country_code: 1, number: '...' } }
>   }
>   'work' : {
>   street: '1600 Pennsylvania Ave NW',
>   city: 'Washington',
>   zip: '20500',
>   phones: { 'fax' : { country_code: 1, number: '...' } }
>       }
>   })
>
> error:
>
> SyntaxException:  message="line 10:14 mismatched input 'work' expecting '}' (...: '...' } } 
>  }  ['wor]k' :...)">
>
>  Is the any suggestion about the problem 2?
>
> Best wishes for everyone, thank you for your watching !
>
> --
> zha...@easemob.com
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: crash with OOM

2016-09-27 Thread Ben Slater
That is a very large heap size for C* - most installations I’ve seen are
running in the 8-12MB heap range. Apparently G1GC is better for larger
heaps so that may help. However, you are probably better off digging a bit
deeper into what is using all that heap? Massive IN clause lists? Massive
multi-partition batches? Massive partitions?

Especially given it hit two nodes simultaneously I would be looking for
 rogue query as my first point of investigation.

Cheers
Ben

On Tue, 27 Sep 2016 at 17:49 xutom <xutom2...@126.com> wrote:

>
> Hi, all
> I have a C* cluster with 12 nodes.  My cassandra version is 2.1.14; Just
> now two nodes crashed and client fails to export data with read consistency
> QUORUM. The following are logs of failed nodes:
>
> ERROR [SharedPool-Worker-159] 2016-09-26 20:51:14,124 Message.java:538 -
> Unexpected exception during request; channel = [id: 0xce43a388, /
> 13.13.13.80:55536 :> /13.13.13.149:9042]
> java.lang.AssertionError: null
> at
> org.apache.cassandra.transport.ServerConnection.applyStateTransition(ServerConnection.java:100)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
> at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:442)
> [apache-cassandra-2.1.14.jar:2.1.14]
> at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
> [apache-cassandra-2.1.14.jar:2.1.14]
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [na:1.7.0_65]
> at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> [apache-cassandra-2.1.14.jar:2.1.14]
> at
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-2.1.14.jar:2.1.14]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> ERROR [SharedPool-Worker-116] 2016-09-26 20:51:14,125
> JVMStabilityInspector.java:117 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [SharedPool-Worker-121] 2016-09-26 20:51:14,125
> JVMStabilityInspector.java:117 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [SharedPool-Worker-157] 2016-09-26 20:51:14,124 Message.java:538 -
> Unexpected exception during request; channel = [id: 0xce43a388, /
> 13.13.13.80:55536 :> /13.13.13.149:9042]
>
> My server has total 256G memory so I set the MAX_HEAP_SIZE 60G, the config
> in cassandra-env.sh:
> MAX_HEAP_SIZE="60G"
> HEAP_NEWSIZE="20G"
> How to solve such OOM?
>
>
>
>
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Exceptions whenever compaction happens

2016-09-26 Thread Ben Slater
gt; at
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:506)
> ~[apache-cassandra-3.0.9.jar:3.0.9]
> at
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> ~[apache-cassandra-3.0.9.jar:3.0.9]
> at
> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:76)
> ~[apache-cassandra-3.0.9.jar:3.0.9]
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1797)
> ~[apache-cassandra-3.0.9.jar:3.0.9]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2466)
> ~[apache-cassandra-3.0.9.jar:3.0.9]
> ... 5 common frames omitted
>
>
> We have no idea how to solve this. I am losing sleep over this, please
> help!
>
> Cassandra: 3.0.9 (also happened in 3.0.8)
> Java: Oracle jdk "1.8.0_102"
> jemalloc enabled
>
>
> Regards,
>
> Nikhil Sharma
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Question about replica and replication factor

2016-09-19 Thread Ben Slater
“replica” here means “a node that has a copy of the data for a given
partition”. The scenario being discussed hear is CL > 1. In this case,
rather than using up network and processing capacity sending all the data
from all the nodes required to meet the consistency level, Cassandra gets
the full data from one replica and  checksums from the others. Only if the
checksums don’t match the full data does Cassandra need to get full data
from all the relevant replicas.

I think the other point here is, conceptually, you should think of the
coordinator as splitting up any query that hits multiple partitions into a
set of queries, one per partition (there might be some optimisations that
make this not quite physically correct but conceptually it’s about right).
Discussion such as the one you quote above tend to be considering a single
partition read (which is the most common kind of read in most uses of
Cassandra).

Cheers
Ben

On Tue, 20 Sep 2016 at 15:18 Jun Wu <wuxiaomi...@hotmail.com> wrote:

>
>
> Yes, I think for my case, at least two nodes need to be contacted to get
> the full set of data.
>
> But another thing comes up about dynamic snitch. It's the wrapped snitch
> and enabled by default and it'll choose the fastest/closest node to read
> data from. Another post is about this.
>
> http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future
>
>
> The thing is why it's still emphasis only one replica to read data from.
> Below is from the post:
>
> To begin, let’s first answer the most obvious question: what is dynamic
> snitching? To understand this, we’ll first recall what a snitch does. A
> snitch’s function is to determine which datacenters and racks are both
> written to and read from. So, why would that be ‘dynamic?’ This comes into
> play on the read side only (there’s nothing to be done for writes since we
> send them all and then block to until the consistency level is achieved.)
> When doing reads however, Cassandra only asks one node for the actual data,
> and, depending on consistency level and read repair chance, it asks the
> remaining replicas for checksums only. This means that it has a choice of
> however many replicas exist to ask for the actual data, and this is where
> the dynamic snitch goes to work.
>
> Since only one replica is sending the full data we need, we need to chose
> the best possible replica to ask, since if all we get back is checksums we
> have nothing useful to return to the user. The dynamic snitch handles this
> task by monitoring the performance of reads from the various replicas and
> choosing the best one based on this history.
>
> Sent from my iPad
> On Sep 20, 2016, at 00:03, Ben Slater <ben.sla...@instaclustr.com> wrote:
>
> If your read operation requires data from multiple partitions and the
> partitions are spread across multiple nodes then the coordinator has the
> job of contacting the multiple nodes to get the data and return to the
> client. So, in your scenario, if you did a select * from table (with no
> where clause) the coordinator would need to contact and execute a read on
> at least one other node to satisfy the query.
>
> Cheers
> Ben
>
> On Tue, 20 Sep 2016 at 14:50 Jun Wu <wuxiaomi...@hotmail.com> wrote:
>
>> Hi Ben,
>>
>> Thanks for the quick response.
>>
>> It's clear about the example for single row/partition. However,
>> normally data are not single row. Then for this case, I'm still confused.
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsRead_c.html
>>
>> The link above gives an example of 10 nodes cluster with RF = 3. But
>> the figure and the words in the post shows that the coordinator only
>> contact/read data from one replica, and operate read repair for the left
>> replicas.
>>
>> Also, how could read accross all nodes in the cluster?
>>
>> Thanks!
>>
>> Jun
>>
>>
>> From: ben.sla...@instaclustr.com
>> Date: Tue, 20 Sep 2016 04:18:59 +
>> Subject: Re: Question about replica and replication factor
>> To: user@cassandra.apache.org
>>
>>
>> Each individual read (where a read is a single row or single partition)
>> will read from one node (ignoring read repairs) as each partition will be
>> contained entirely on a single node. To read the full set of data,  reads
>> would hit at least two nodes (in practice, reads would likely end up being
>> distributed across all the nodes in your cluster).
>>
>> Cheers
>> Ben
>>
>> On Tue, 20 Sep 2016 at 14:09 Jun Wu <wuxiaomi...@hotmail.com> wrote:
>>
>> Hi there,
>>
>> I have a question about the re

Re: Question about replica and replication factor

2016-09-19 Thread Ben Slater
Each individual read (where a read is a single row or single partition)
will read from one node (ignoring read repairs) as each partition will be
contained entirely on a single node. To read the full set of data,  reads
would hit at least two nodes (in practice, reads would likely end up being
distributed across all the nodes in your cluster).

Cheers
Ben

On Tue, 20 Sep 2016 at 14:09 Jun Wu <wuxiaomi...@hotmail.com> wrote:

> Hi there,
>
> I have a question about the replica and replication factor.
>
> For example, I have a cluster of 6 nodes in the same data center.
> Replication factor RF is set to 3  and the consistency level is default 1.
> According to this calculator http://www.ecyrd.com/cassandracalculator/,
> every node will store 50% of the data.
>
> When I want to read all data from the cluster, how many nodes should I
> read from, 2 or 1? Is it 2, because each node has half data? But in the
> calculator it show 1: You are really reading from 1 node every time.
>
>Any suggestions? Thanks!
>
> Jun
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Export/Importing keyspace from a different sized cluster

2016-09-19 Thread Ben Slater
CQLSH COPY FROM / COPY TO? There are some significant performance
improvements in recent versions:
https://issues.apache.org/jira/browse/CASSANDRA-11053

On Tue, 20 Sep 2016 at 07:49 Justin Sanciangco <jsancian...@blizzard.com>
wrote:

> Hello,
>
>
>
> Assuming I can’t get ports opened from source to target cluster to run
> sstableloader, what methods can I use to load a single keyspace from one
> cluster to another cluster of different size?
>
>
>
> Appreciate the help…
>
>
>
> Thanks,
>
> Justin
>
>
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Ben Slater
We’ve successfully used the rsynch method you outline quite a few times in
situations where we’ve had clusters that take forever to add new nodes
(mainly due to secondary indexes) and need to do a quick replacement for
one reason or another. As you mention, the main disadvantage we ran into is
that the node doesn’t get cleaned up through the replacement process like a
newly streamed node does (plus the extra operational complexity).

Cheers
Ben

On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos <vasileiosvlac...@gmail.com>
wrote:

> Hello and thanks for your responses,
>
> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
> difference. Any ideas why streaming is limited to only two of the three
> nodes available?
>
> As an alternative to slow streaming I tried this:
>
>   - install C* on a new node, stop the service and delete
> /var/lib/cassandra/*
>  - rsync /etc/cassandra from old node to new node
>  - rsync /var/lib/cassandra from old node to new node
>  - stop C* on the old node
>  - rsync /var/lib/cassandra from old node to new node
>  - move the old node to a different IP
>  - move the new node to the old node's original IP
>  - start C* on the new node (no need for the replace_node option in
> cassandra-env.sh)
>
> This technique has been successful so far for a demo cluster with fewer
> data. The only disadvantage for us is that we were hoping that by streaming
> the SSTables to the new node, tombstones would be discarded (freeing a lot
> of disk space on our live cluster). This is exactly what happened for the
> one node we streamed so far; unfortunately, the slow streaming generates a
> lot of hints which makes recovery a very long process.
>
> Do you guys see any other problems with the rsync method that I've skipped?
>
> Regarding the tombstones issue (if we finally do what I described above),
> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
> have not used sstablesplit in the past, so another thing I'd like to ask is
> if you guys find this a good/bad idea for what I'm trying to do.
>
> Many thanks,
> Vasilis
>
> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jji...@apache.org> wrote:
>
>>
>>
>> On 2016-09-12 09:38 (-0700), daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
>> >
>> >
>>
>> It's extremely unlikely you'll be able to saturate a 10g link with a
>> single instance cassandra.
>>
>> Faster Cassandra streaming is a work in progress - being able to send
>> more than one file at a time is probably the most obvious area for
>> improvement, and being able to better deal with the CPU / garbage generated
>> on the receiving side is just behind that. You'll likely be able to stream
>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>> bound).
>>
>>
>>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Ben Slater
Hi Siddarth,

It would probably help people provide and answer if you let everyone some
more details like:
- cassandra version and driver version you are using
- query that is being executed when the error occurs
- schema of the table that is being queried

Cheers
Ben

On Thu, 1 Sep 2016 at 21:19 Siddharth Verma <verma.siddha...@snapdeal.com>
wrote:

> Hi,
> Could someone help me out with the following exception in cassandra java
> driver.
> Why did it occur?
> MyClass program is paging on the result set.
>
> com.datastax.driver.core.exceptions.ServerError: An unexpected error
> occurred server side on /10.0.230.25:9042: java.lang.AssertionError:
> [DecoratedKey(3529259302770464040,
> 53444c373134303435333030),min(2177391360409801028)]
> at
> com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:63)
> at
> com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:25)
> at
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
> at
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
> at
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
> at com.personal.trial.MyClass.fetchLoop(MyClass.java:63)
> at com.personal.trial.MyClass.run(MyClass.java:85)
> Caused by: com.datastax.driver.core.exceptions.ServerError: An unexpected
> error occurred server side on /10.0.230.25:9042:
> java.lang.AssertionError: [DecoratedKey(3529259302770464040,
> 53444c373134303435333030),min(2177391360409801028)]
> at
> com.datastax.driver.core.Responses$Error.asException(Responses.java:108)
> at
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:500)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
> at
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
> at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
> at
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
> at
&g

Re: Read Repairs and CL

2016-08-30 Thread Ben Slater
Thanks Sam - a couple of subtleties there that we missed in our review.

Cheers
Ben

On Tue, 30 Aug 2016 at 19:42 Sam Tunnicliffe <s...@beobal.com> wrote:

> Just to clarify a little further, it's true that read repair queries are
> performed at CL ALL, but this is slightly different to a regular,
> user-initiated query at that CL.
>
> Say you have RF=5 and you issue read at CL ALL, the coordinator will send
> requests to all 5 replicas and block until it receives a response from each
> (or a timeout occurs) before replying to the client. This is the
> straightforward and intuitive case.
>
> If instead you read at CL QUORUM, the # of replicas required for CL is 3,
> so the coordinator only contacts 3 nodes. In the case where a speculative
> retry is activated, an additional replica is added to the initial set. The
> coordinator will still only wait for 3 out of the 4 responses before
> proceeding, but if a digest mismatch occurs the read repair queries are
> sent to all 4. It's this follow up query that the coordinator executes at
> CL ALL, i.e. it requires all 4 replicas to respond to the read repair query
> before merging their results to figure out the canonical, latest data.
>
> You can see that the number of replicas queried/required for read repair
> is different than if the client actually requests a read at CL ALL (i.e.
> here it's 4, not 5), it's the behaviour of waiting for all *contacted*
> replicas to respond which is significant here.
>
> There are addtional considerations when constructing that initial replica
> set (which you can follow in
> o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's
> read_repair_chance, dclocal_read_repair_chance and speculative_retry
> options. THe main gotcha is global read repair (via read_repair_chance)
> which will trigger cross-dc repairs at CL ALL in the case of a digest
> mismatch, even if the requested CL is DC-local.
>
>
> On Sun, Aug 28, 2016 at 11:55 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>> In case anyone else is interested - we figured this out. When C* decides
>> it need to do a repair based on a digest mismatch from the initial reads
>> for the consistency level it does actually try to do a read at CL=ALL in
>> order to get the most up to date data to use to repair.
>>
>> This led to an interesting issue in our case where we had one node in an
>> RF3 cluster down for maintenance (to correct data that became corrupted due
>> to a severe write overload) and started getting occasional “timeout during
>> read query at consistency LOCAL_QUORUM” failures. We believe this due to
>> the case where data for a read was only available on one of the two up
>> replicas which then triggered an attempt to repair and a failed read at
>> CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
>> that C* reports a failure at the originally request level even when it was
>> actually the attempted repair read at CL=ALL which could not read
>> sufficient replicas - a bit confusing (although I can also see how getting
>> CL=ALL errors when you thought you were reading at QUORUM or ONE would be
>> confusing).
>>
>> Cheers
>> Ben
>>
>> On Sun, 28 Aug 2016 at 10:52 kurt Greaves <k...@instaclustr.com> wrote:
>>
>>> Looking at the wiki for the read path (
>>> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom
>>> diagram for reading with a read repair, it states the following when
>>> "reading from all replica nodes" after there is a hash mismatch:
>>>
>>> If hashes do not match, do conflict resolution. First step is to read
>>>> all data from all replica nodes excluding the fastest replica (since 
>>>> CL=ALL)
>>>>
>>>
>>>  In the bottom left of the diagram it also states:
>>>
>>>> In this example:
>>>>
>>> RF>=2
>>>>
>>> CL=ALL
>>>>
>>>
>>> The (since CL=ALL) implies that the CL for the read during the read
>>> repair is based off the CL of the query. However I don't think that makes
>>> sense at other CLs. Anyway, I just want to clarify what CL the read for the
>>> read repair occurs at for cases where the overall query CL is not ALL.
>>>
>>> Thanks,
>>> Kurt.
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Bootstrapping multiple C* nodes in AWS

2016-08-30 Thread Ben Slater
Hi Aiman,

Best practice would be to map cassandra racks to AWS availability zones. If
you are following this then you would add one node per AZ to keep the
number of nodes in each rack balanced.

It is technically possible to add multiple nodes simultaneously (at least
joining simultaneously - you need to leave a gap between starting) if your
configuration meets some restrictions (although I can’t remember exactly
what those are) and also requires setting
cassandra.consistent.rangemovement=false. However, in general I’d recommend
adding one node at a time unless you’re really confident in what you are
doing, particular if you’re working with a production cluster.

Cheers
Ben

On Tue, 30 Aug 2016 at 16:09 Aiman Parvaiz <ai...@flipagram.com> wrote:

> Hi all
> I am running C* 2.1.12 in AWS EC2 Classic with RF=3 and vnodes(256
> tokens/node). My nodes are distributed in three different availability
> zones. I want to scale up the cluster size, given the data size per node it
> takes around 24 hours to add one node.
>
> I wanted to know if its safe to add multiple nodes at once in AWS and
> should I add them in the same availability zone. Would be grateful to hear
> your experiences here.
>
> Thanks
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Read Repairs and CL

2016-08-28 Thread Ben Slater
In case anyone else is interested - we figured this out. When C* decides it
need to do a repair based on a digest mismatch from the initial reads for
the consistency level it does actually try to do a read at CL=ALL in order
to get the most up to date data to use to repair.

This led to an interesting issue in our case where we had one node in an
RF3 cluster down for maintenance (to correct data that became corrupted due
to a severe write overload) and started getting occasional “timeout during
read query at consistency LOCAL_QUORUM” failures. We believe this due to
the case where data for a read was only available on one of the two up
replicas which then triggered an attempt to repair and a failed read at
CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
that C* reports a failure at the originally request level even when it was
actually the attempted repair read at CL=ALL which could not read
sufficient replicas - a bit confusing (although I can also see how getting
CL=ALL errors when you thought you were reading at QUORUM or ONE would be
confusing).

Cheers
Ben

On Sun, 28 Aug 2016 at 10:52 kurt Greaves <k...@instaclustr.com> wrote:

> Looking at the wiki for the read path (
> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom diagram
> for reading with a read repair, it states the following when "reading from
> all replica nodes" after there is a hash mismatch:
>
> If hashes do not match, do conflict resolution. First step is to read all
>> data from all replica nodes excluding the fastest replica (since CL=ALL)
>>
>
>  In the bottom left of the diagram it also states:
>
>> In this example:
>>
> RF>=2
>>
> CL=ALL
>>
>
> The (since CL=ALL) implies that the CL for the read during the read repair
> is based off the CL of the query. However I don't think that makes sense at
> other CLs. Anyway, I just want to clarify what CL the read for the read
> repair occurs at for cases where the overall query CL is not ALL.
>
> Thanks,
> Kurt.
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
Yep,  that was what I was referring to.

On Thu, 4 Aug 2016 2:24 am Reynald Bourtembourg <
reynald.bourtembo...@esrf.fr> wrote:

> Hi,
>
> Maybe Ben was referring to this issue which has been mentioned recently on
> this mailing list:
> https://issues.apache.org/jira/browse/CASSANDRA-11887
>
> Cheers,
> Reynald
>
>
> On 03/08/2016 18:09, Romain Hardouin wrote:
>
> > Curious why the 2.2 to 3.x upgrade path is risky at best.
> I guess that upgrade from 2.2 is less tested by DataStax QA because DSE4
> used C* 2.1, not 2.2.
> I would say the safest upgrade is 2.1 to 3.0.x
>
> Best,
>
> Romain
>
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
itiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+UseTLAB
>> -XX:CompileCommandFile=/hotspot_compiler
>> -XX:CMSWaitDuration=1
>> -XX:+CMSParallelInitialMarkEnabled
>> -XX:+CMSEdenChunksRecordAlways
>> -XX:CMSWaitDuration=1
>> -XX:+UseCondCardMark
>> -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps
>> -XX:+PrintHeapAtGC
>> -XX:+PrintTenuringDistribution
>> -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintPromotionFailure
>> -XX:PrintFLSStatistics=1
>> -Xloggc:/var/log/cassandra/gc.log
>> -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=10M
>> -Djava.net.preferIPv4Stack=true
>> -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.rmi.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
>> -XX:+UnlockCommercialFeatures
>> -XX:+FlightRecorder
>> -ea
>> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
>> -XX:+CMSClassUnloadingEnabled
>> -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42
>> -Xms2M
>> -Xmx2M
>> -Xmn4096M
>> -XX:+HeapDumpOnOutOfMemoryError
>> -Xss256k
>> -XX:StringTableSize=103
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled
>> -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1
>> -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+UseTLAB
>> -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler
>> -XX:CMSWaitDuration=1
>> -XX:+CMSParallelInitialMarkEnabled
>> -XX:+CMSEdenChunksRecordAlways
>> -XX:CMSWaitDuration=1
>> -XX:+UseCondCardMark
>> -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps
>> -XX:+PrintHeapAtGC
>> -XX:+PrintTenuringDistribution
>> -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintPromotionFailure
>> -XX:PrintFLSStatistics=1
>> -Xloggc:/var/log/cassandra/gc.log
>> -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=10M
>> -Djava.net.preferIPv4Stack=true
>> -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.rmi.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
>> -XX:+UnlockCommercialFeatures
>> -XX:+FlightRecorder
>> -Dlogback.configurationFile=logback.xml
>> -Dcassandra.logdir=/var/log/cassandra
>> -Dcassandra.storagedir=
>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Re : Purging tombstones from a particular row in SSTable

2016-07-27 Thread Ben Slater
.com> wrote:
>>>
>>> What is your GC_grace_seconds set to?
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 1:13 PM, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>> thanks Vinay and DuyHai.
>>>
>>>
>>>
>>> we are using verison 2.0.14. I did "user defined compaction"
>>> following the instructions in the below link, The tombstones still persist
>>> even after that.
>>>
>>>
>>>
>>> https://gist.github.com/jeromatron/e238e5795b3e79866b83
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_jeromatron_e238e5795b3e79866b83=CwMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=-sQ3Vf5bs3z4cO36h_AU-kIhMGVKcb3eCtzIb-fZ1Fc=0RQ3r6c0L4vICot8eqpOBKBAuKiKEkoKdmcjLbvBBwY=>
>>>
>>>
>>>
>>> Also, we changed the tombstone_compaction_interval : 1800
>>> and tombstone_threshold : 0.1, but it did not help.
>>>
>>>
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>> This feature is also exposed directly in nodetool from version Cassandra
>>> 3.4
>>>
>>>
>>>
>>> nodetool compact --user-defined 
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 9:58 PM, Vinay Chella <vche...@netflix.com>
>>> wrote:
>>>
>>> You can run file level compaction using JMX to get rid of tombstones in
>>> one SSTable. Ensure you set GC_Grace_seconds such that
>>>
>>>
>>>
>>> current time >= deletion(tombstone time)+ GC_Grace_seconds
>>>
>>>
>>>
>>> File level compaction
>>>
>>>
>>>
>>> /usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - localhost:
>>>
>>> ​{​
>>>
>>> ​port}
>>>
>>>  org.apache.cassandra.db:type=CompactionManager 
>>> forceUserDefinedCompaction="'${KEYSPACE}','${
>>>
>>> ​SSTABLEFILENAME
>>>
>>> }'""
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>> hi;
>>>
>>>   we have a columnfamily that has around 1000 rows, with one row is
>>> really huge (million columns). 95% of the row contains tombstones. Since
>>> there exists just one SSTable , there is going to be no compaction kicked
>>> in. Any way we can get rid of the tombstones in that row?
>>>
>>>
>>>
>>> Userdefined compaction nor nodetool compact had no effect. Any ideas
>>> folks?
>>>
>>>
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Is my cluster normal?

2016-07-07 Thread Ben Slater
Hi Yuan,

You might find this blog post a useful comparison:
https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/

Although the focus is on Spark and Cassandra and multi-DC there are also
some single DC benchmarks of m4.xl clusters plus some discussion of how we
went about benchmarking.

Cheers
Ben


On Fri, 8 Jul 2016 at 07:52 Yuan Fang <y...@kryptoncloud.com> wrote:

> Yes, here is my stress test result:
> Results:
> op rate   : 12200 [WRITE:12200]
> partition rate: 12200 [WRITE:12200]
> row rate  : 12200 [WRITE:12200]
> latency mean  : 16.4 [WRITE:16.4]
> latency median: 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max   : 1408.4 [WRITE:1408.4]
> Total partitions  : 100 [WRITE:100]
> Total errors  : 0 [WRITE:0]
> total gc count: 0
> total gc mb   : 0
> total gc time (s) : 0
> avg gc time(ms)   : NaN
> stdev gc time(ms) : 0
> Total operation time  : 00:01:21
> END
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <r...@foundev.pro> wrote:
>
>> Lots of variables you're leaving out.
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daeme...@gmail.com>
>>>> wrote:
>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will 
>>>>> not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>> *...*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <y...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


  1   2   >