Re: Cassandra 2.1 on Xenial

2018-03-18 Thread Cyril Scetbon
Okay I found that when building the package, the new helper dh_python2 must be 
used to fix that issue.

—
Cyril Scetbon

> On Mar 18, 2018, at 11:08 AM, Cyril Scetbon  wrote:
> 
> Hey guys,
> 
> Having to still use Cassandra 2.1, I have installed it on Ubuntu Xenial and I 
> have an issue with cqlsh. I was able to fix it by installing python-support 
> and a fix from 2.1.16. However I’d like to know if there is a way to do it 
> without installing an old package (python-support) on Xenial. dh-python is 
> supposed to have replaced python-support, however cqlsh complains when it’s 
> not installed : 
> 
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh", line 121, in 
> from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling
> ImportError: No module named cqlshlib
> 
> Is there a better way than installing that old package ?
> 
> Thanks 
> —
> Cyril Scetbon
> 



Re: Nodetool Repair --full

2018-03-18 Thread kurt greaves
Worth noting that if you have racks == RF you only need to repair one rack
to repair all the data in the cluster if you *don't* use -pr. Also note
that full repairs on >=3.0 case anti-compactions and will mark things as
repaired, so once you start repairs you need to keep repairing to ensure
you don't have any zombie data or other problems.

On 17 March 2018 at 15:52, Hannu Kröger  wrote:

> Hi Jonathan,
>
> If you want to repair just one node (for example if it has been down for
> more than 3h), run “nodetool repair -full” on that node. This will bring
> all data on that node up to date.
>
> If you want to repair all data on the cluster, run “nodetool repair -full
> -pr” on each node. This will run full repair on all nodes but it will do it
> so only the primary range for each node is fixed. If you do it on all
> nodes, effectively the whole token range is repaired. You can run the same
> without -pr to get the same effect but it’s not efficient because then you
> are doing the repair RF times on all data instead of just repairing the
> whole data once.
>
> I hope this clarifies,
> Hannu
>
> On 17 Mar 2018, at 17:20, Jonathan Baynes 
> wrote:
>
> Hi Community,
>
> Can someone confirm, as the documentation out on the web is so
> contradictory and vague.
>
> Nodetool repair –full if I call this, do I need to run this on ALL my
> nodes or is just the once sufficient?
>
> Thanks
> J
>
> *Jonathan Baynes*
> DBA
> Tradeweb Europe Limited
> Moor Place  •  1 Fore Street Avenue
> 
>   •
> 
>   London EC2Y 9DT
> 
> P +44 (0)20 77760988 <+44%2020%207776%200988>  •  F +44 (0)20 7776 3201
> <+44%2020%207776%203201>  •  M +44 (0)7884111546 <+44%207884%20111546>
> jonathan.bay...@tradeweb.com
>
>     follow us:  **
>    <
> image003.jpg> 
> —
> A leading marketplace  for
> electronic fixed income, derivatives and ETF trading
>
>
> 
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy it. Any unauthorized
> copying, disclosure or distribution of the material in this e-mail is
> strictly forbidden. Tradeweb reserves the right to monitor all e-mail
> communications through its networks. If you do not wish to receive
> marketing emails about our products / services, please let us know by
> contacting us, either by email at contac...@tradeweb.com or by writing to
> us at the registered office of Tradeweb in the UK, which is: Tradeweb
> Europe Limited (company number 3912826), 1 Fore Street Avenue London EC2Y
> 9DT
> .
> To see our privacy policy, visit our website @ www.tradeweb.com.
>
>
>


Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
“* 1000 statements in in each batch” sounds like you are doing batching in
both cases. I wouldn't expect things to get better with larger sizes than
that. We’ve generally found more like 100 is the sweet spot but I’m sure it’s
data specific.

On Sun, 18 Mar 2018 at 21:17 onmstester onmstester 
wrote:

> I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch
> = 100K insert queue in non-batch scenario.
> Using more than 1000 statememnts per batch throws batch limit exception
> and some documents recommend no to change batch_size_limit??!
>
> Sent using Zoho Mail 
>
>
>  On Sun, 18 Mar 2018 13:14:54 +0330 *Ben Slater
> >* wrote 
>
> When you say batch was worth than async in terms of throughput are you
> comparing throughput with the same number of threads or something? I would
> have thought if you have much less CPU usage on the client with batching
> and your Cassandra cluster doesn’t sound terribly stressed then there is
> room to increase threads on the client to up throughput (unless your
> bottlenecked on IO or something)?
>
> On Sun, 18 Mar 2018 at 20:27 onmstester onmstester 
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer *
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail 
>
>
>  On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> >* wrote 
>
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester 
> wrote:
>
> --
>
>
> *Ben Slater**Chief Product Officer *
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail 
>
> --


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Cassandra 2.1 on Xenial

2018-03-18 Thread Cyril Scetbon
Hey guys,

Having to still use Cassandra 2.1, I have installed it on Ubuntu Xenial and I 
have an issue with cqlsh. I was able to fix it by installing python-support and 
a fix from 2.1.16. However I’d like to know if there is a way to do it without 
installing an old package (python-support) on Xenial. dh-python is supposed to 
have replaced python-support, however cqlsh complains when it’s not installed : 

Traceback (most recent call last):
  File "/usr/bin/cqlsh", line 121, in 
from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling
ImportError: No module named cqlshlib

Is there a better way than installing that old package ?

Thanks 
—
Cyril Scetbon



Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch = 
100K insert queue in non-batch scenario.

Using more than 1000 statememnts per batch throws batch limit exception and 
some documents recommend no to change batch_size_limit??!


Sent using Zoho Mail






 On Sun, 18 Mar 2018 13:14:54 +0330 Ben Slater 
ben.sla...@instaclustr.com wrote 




When you say batch was worth than async in terms of throughput are you 
comparing throughput with the same number of threads or something? I would have 
thought if you have much less CPU usage on the client with batching and your 
Cassandra cluster doesn’t sound terribly stressed then there is room to 
increase threads on the client to up throughput (unless your bottlenecked on IO 
or something)? 



On Sun, 18 Mar 2018 at 20:27 onmstester onmstester onmstes...@zoho.com 
wrote:




-- 

Ben Slater
Chief Product Officer



Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.






Input data does not preserve good locality and I've already tested batch 
insert, it was worse than executeAsync in case of throughput but much less CPU 
usage at client side.



Sent using Zoho Mail






 On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater 
ben.sla...@instaclustr.com wrote 









You will probably find grouping writes into small batches improves overall 
performance (if you are not doing it already). See the following presentation 
for some more info: 
https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes



Cheers

Ben




On Sun, 18 Mar 2018 at 19:23 onmstester onmstester onmstes...@zoho.com 
wrote:




-- 

Ben Slater
Chief Product Officer


Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.









I need to insert some millions records in seconds in Cassandra. Using one 
client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K



I could achieve  25% of throughtput i needed, client CPU is more than 80% and 
increasing number of threads cause some execAsync to fail, so configs above are 
the best the client could handle. Cassandra nodes cpu is less than 30% in 
average. The data has no locality in sake of partition keys and i can't use 
createSStable mechanism. Is there any tuning which i'm missing in client side, 
cause the server side is already tuned with datastax recomendations.

Sent using Zoho Mail















Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
When you say batch was worth than async in terms of throughput are you
comparing throughput with the same number of threads or something? I would
have thought if you have much less CPU usage on the client with batching
and your Cassandra cluster doesn’t sound terribly stressed then there is
room to increase threads on the client to up throughput (unless your
bottlenecked on IO or something)?

On Sun, 18 Mar 2018 at 20:27 onmstester onmstester 
wrote:

> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail 
>
>
>  On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> >* wrote 
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester 
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer *
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail 
>
> --


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
Input data does not preserve good locality and I've already tested batch 
insert, it was worse than executeAsync in case of throughput but much less CPU 
usage at client side.



Sent using Zoho Mail






 On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater 
ben.sla...@instaclustr.com wrote 




You will probably find grouping writes into small batches improves overall 
performance (if you are not doing it already). See the following presentation 
for some more info: 
https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes



Cheers

Ben




On Sun, 18 Mar 2018 at 19:23 onmstester onmstester onmstes...@zoho.com 
wrote:




-- 

Ben Slater
Chief Product Officer



Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.






I need to insert some millions records in seconds in Cassandra. Using one 
client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K



I could achieve  25% of throughtput i needed, client CPU is more than 80% and 
increasing number of threads cause some execAsync to fail, so configs above are 
the best the client could handle. Cassandra nodes cpu is less than 30% in 
average. The data has no locality in sake of partition keys and i can't use 
createSStable mechanism. Is there any tuning which i'm missing in client side, 
cause the server side is already tuned with datastax recomendations.

Sent using Zoho Mail












Re: Cassandra client tuning

2018-03-18 Thread Ben Slater
You will probably find grouping writes into small batches improves overall
performance (if you are not doing it already). See the following
presentation for some more info:
https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes

Cheers
Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester 
wrote:

> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail 
>
>
> --


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Cassandra client tuning

2018-03-18 Thread onmstester onmstester
I need to insert some millions records in seconds in Cassandra. Using one 
client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K


I could achieve  25% of throughtput i needed, client CPU is more than 80% and 
increasing number of threads cause some execAsync to fail, so configs above are 
the best the client could handle. Cassandra nodes cpu is less than 30% in 
average. The data has no locality in sake of partition keys and i can't use 
createSStable mechanism. Is there any tuning which i'm missing in client side, 
cause the server side is already tuned with datastax recomendations.

Sent using Zoho Mail