Re: [Ann] Ansible base Cassandra stress framework for EC2

2015-11-03 Thread Tzach Livyatan
For anyone interested, the framework can now deploy and stress Cassandra on
multi EC2 regions.
https://github.com/scylladb/cassandra-test-and-deploy

Cheers
Tzach


On Thu, Sep 3, 2015 at 9:21 PM, Tzach Livyatan <tz...@cloudius-systems.com>
wrote:

> I'm please to share a framework for running Cassandra stress tests on EC2.
> https://github.com/cloudius-systems/ansible-cassandra-cluster-stress
>
> The framework is a collection of Ansible playbooks and scripts, allowing
> to:
> - Create a Cassandra cluster (setting server type, version, etc)
> - Launch any number of loaders
> - Run cassandra-stress on all loaders and collect the results
> - Add nodes to a running cluster
> - Stop and starts nodes
> - Clean old data from servers before each test
> - Collect and display relevant metrics on a Collectd+Graphite+Tessera
> <http://urbanairship.com/blog/2014/06/30/introducing-tessera-a-graphite-frontend>
>  server
>
> Use cases I tested using this framework:
> * Stress with multiple loaders
> * Out Scale
> <https://github.com/cloudius-systems/cassandra-test-and-deploy/wiki/Testing-Cassandra-Out-Scale>
>  (adding
> server under stress)
> * Testing Cassandra Repair
> <https://github.com/cloudius-systems/cassandra-test-and-deploy/wiki/Testing-Cassandra-Repair>
>
> More info in README
> <https://github.com/cloudius-systems/cassandra-test-and-deploy#ansible-cassandra-cluster-stress>
> the  and Wiki
>
> Some of my future plans include adding YCSB, run on other providers and
> more.
> Contributions and suggestions will be appreciated!
>
> Tzach
>


Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Tzach Livyatan
On Wed, Sep 23, 2015 at 12:20 AM, Minh Do <m...@netflix.com> wrote:

> First glance at their github, it looks like they re-implemented Cassandra
> in C++.  90% components in Cassandra are
> in scylladb, i.e. compaction, repair, CQL, gossip, SStable.
>

True

>
>
> With C++, I believe this helps performance to some extent up to a point
> when compaction has not run yet.
> Then, it will be disk IO to be the dominant factor in the performance
> measurement as the more traffics to a node the more degrading
> the performance is across the cluster.
>
Also, they only support Thrift protocol so it won't work with Java Driver
> with the new asynchronous protocol.  I doubt their tests
> are truly a fair one.
>

Scylla currently only support CQL
For more info, I suggest to continue the discussion at the new Scylla list
https://groups.google.com/forum/#!forum/scylladb-users



>
> On Tue, Sep 22, 2015 at 2:13 PM, Venkatesh Arivazhagan <
> venkey.a...@gmail.com> wrote:
>
>> I came across this article:
>> zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/
>>
>> Tzach, I would love to know/understand moree about ScyllaDB too. Also the
>> benchmark seems to have only 1 DB Server. Do you have benchmark numbers
>> where more than 1 DB servers were involved? :)
>>
>>
>> On Tue, Sep 22, 2015 at 1:40 PM, Sachin Nikam <skni...@gmail.com> wrote:
>>
>>> Tzach,
>>> Can you point to any documentation on scylladb site which talks about
>>> how/why scylla db performs better than Cassandra while using the same
>>> architecture?
>>> Regards
>>> Sachin
>>>
>>> On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan <
>>> tz...@cloudius-systems.com> wrote:
>>>
>>>> Hello Cassandra users,
>>>>
>>>> We are pleased to announce a new member of the Cassandra Ecosystem -
>>>> ScyllaDB
>>>> ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
>>>> written with the goal of delivering superior performance and consistent low
>>>> latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.
>>>>
>>>> ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works
>>>> out of the box with Cassandra tools like cqlsh, Spark connector, nodetool
>>>> and cassandra-stress. ScyllaDB is a drop-in replacement solution for the
>>>> Cassandra server side packages.
>>>>
>>>> Scylla is implemented using the new shared-nothing Seastar
>>>> <http://www.seastar-project.org/> framework for extreme performance on
>>>> modern multicore hardware, and the Data Plane Development Kit (DPDK) for
>>>> high-speed low-latency networking.
>>>>
>>>> Try Scylla Now - http://www.scylladb.com
>>>>
>>>> We will be at Cassandra summit 2015, you are welcome to visit our booth
>>>> to hear more and see a demo.
>>>> Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM
>>>> - 2:30 PM in rooms M1 - M3.
>>>>
>>>> Regards
>>>> Tzach
>>>> scylladb
>>>>
>>>>
>>>
>>
>


Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Tzach Livyatan
On Wed, Sep 23, 2015 at 12:13 AM, Venkatesh Arivazhagan <
venkey.a...@gmail.com> wrote:

> I came across this article:
> zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/
>
> Tzach, I would love to know/understand moree about ScyllaDB too. Also the
> benchmark seems to have only 1 DB Server. Do you have benchmark numbers
> where more than 1 DB servers were involved? :)
>
Benchmark specs are here
http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark/
Yes, it is one server.
More benchmarks, with more servers, will be publish soon.



>
>
> On Tue, Sep 22, 2015 at 1:40 PM, Sachin Nikam <skni...@gmail.com> wrote:
>
>> Tzach,
>> Can you point to any documentation on scylladb site which talks about
>> how/why scylla db performs better than Cassandra while using the same
>> architecture?
>> Regards
>> Sachin
>>
>> On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan <
>> tz...@cloudius-systems.com> wrote:
>>
>>> Hello Cassandra users,
>>>
>>> We are pleased to announce a new member of the Cassandra Ecosystem -
>>> ScyllaDB
>>> ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
>>> written with the goal of delivering superior performance and consistent low
>>> latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.
>>>
>>> ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works
>>> out of the box with Cassandra tools like cqlsh, Spark connector, nodetool
>>> and cassandra-stress. ScyllaDB is a drop-in replacement solution for the
>>> Cassandra server side packages.
>>>
>>> Scylla is implemented using the new shared-nothing Seastar
>>> <http://www.seastar-project.org/> framework for extreme performance on
>>> modern multicore hardware, and the Data Plane Development Kit (DPDK) for
>>> high-speed low-latency networking.
>>>
>>> Try Scylla Now - http://www.scylladb.com
>>>
>>> We will be at Cassandra summit 2015, you are welcome to visit our booth
>>> to hear more and see a demo.
>>> Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM
>>> - 2:30 PM in rooms M1 - M3.
>>>
>>> Regards
>>> Tzach
>>> scylladb
>>>
>>>
>>
>


Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Tzach Livyatan
Hi Sachin

On Tue, Sep 22, 2015 at 11:40 PM, Sachin Nikam <skni...@gmail.com> wrote:

> Tzach,
> Can you point to any documentation on scylladb site which talks about
> how/why scylla db performs better than Cassandra while using the same
> architecture?
>
see here
http://www.scylladb.com/technology/architecture/


> Regards
> Sachin
>
> On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan <
> tz...@cloudius-systems.com> wrote:
>
>> Hello Cassandra users,
>>
>> We are pleased to announce a new member of the Cassandra Ecosystem -
>> ScyllaDB
>> ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
>> written with the goal of delivering superior performance and consistent low
>> latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.
>>
>> ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works
>> out of the box with Cassandra tools like cqlsh, Spark connector, nodetool
>> and cassandra-stress. ScyllaDB is a drop-in replacement solution for the
>> Cassandra server side packages.
>>
>> Scylla is implemented using the new shared-nothing Seastar
>> <http://www.seastar-project.org/> framework for extreme performance on
>> modern multicore hardware, and the Data Plane Development Kit (DPDK) for
>> high-speed low-latency networking.
>>
>> Try Scylla Now - http://www.scylladb.com
>>
>> We will be at Cassandra summit 2015, you are welcome to visit our booth
>> to hear more and see a demo.
>> Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM -
>> 2:30 PM in rooms M1 - M3.
>>
>> Regards
>> Tzach
>> scylladb
>>
>>
>


ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Tzach Livyatan
Hello Cassandra users,

We are pleased to announce a new member of the Cassandra Ecosystem -
ScyllaDB
ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
written with the goal of delivering superior performance and consistent low
latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.

ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works out
of the box with Cassandra tools like cqlsh, Spark connector, nodetool and
cassandra-stress. ScyllaDB is a drop-in replacement solution for the
Cassandra server side packages.

Scylla is implemented using the new shared-nothing Seastar
 framework for extreme performance on
modern multicore hardware, and the Data Plane Development Kit (DPDK) for
high-speed low-latency networking.

Try Scylla Now - http://www.scylladb.com

We will be at Cassandra summit 2015, you are welcome to visit our booth to
hear more and see a demo.
Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM -
2:30 PM in rooms M1 - M3.

Regards
Tzach
scylladb


Re: Repair documentation

2015-09-06 Thread Tzach Livyatan
On Fri, Sep 4, 2015 at 3:50 PM, Marcus Olsson 
wrote:

> Hi,
>
> While checking the repair documentation at
> http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html
> I noticed the line *Use the **-hosts** option to list the good nodes to
> use for repairing the bad nodes. Use **-h** to name the bad nodes.* and
> below there was an example:
>
> *nodetool repair -pr -hosts **10.2**.**2.20* *10.2**.**2.21* which should
> do *A partitioner range repair of the bad partition on current node using
> the good partitions on 10.2.2.20 or 10.2.2.21* according to the
> documentation.
>
Look like -pr and -hosts does not mix, and the documentation is not up to
date
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/nodetool/Repair.java#L90



>
> Is this correctly documented because I don't seem to be getting the right
> results when trying.
>
> I started up a C* 2.1.9 CCM cluster and when running
>
> * repair -h 127.0.0.1 -p 7100 repair -pr 127.0.0.2 127.0.0.3*
>
> I get the error:
>
> *nodetool: Keyspace [127.0.0.3] does not exist.*
>
> ---
>
> When I run it as
>
> * nodetool -h 127.0.0.1 -p 7100 repair -pr -hosts 127.0.0.2*
>
> instead it gives me the error:
> *java.lang.RuntimeException: Primary range repair should be performed on
> all nodes in the cluster.*
> *at
> org.apache.cassandra.tools.NodeTool$Repair.execute(NodeTool.java:1873)*
> *at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:288)*
> *at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)*
>
> ---
>
> I even tried running it as
>
>
> * repair -h 127.0.0.1 -p 7100 repair -hosts 127.0.0.2 *
> and then I get
> *The current host must be part of the repair*
>
> ---
>
> This seems like either bug(s) or a documentation mistake?
>
> There is also a line in
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html
> which says that *You can specify which nodes have the good data for
> replacing the outdated data.* which seems to be related(and also the
> reason I tried it out)?
>
> BR
> Marcus Olsson
>


[Ann] Ansible base Cassandra stress framework for EC2

2015-09-03 Thread Tzach Livyatan
I'm please to share a framework for running Cassandra stress tests on EC2.
https://github.com/cloudius-systems/ansible-cassandra-cluster-stress

The framework is a collection of Ansible playbooks and scripts, allowing to:
- Create a Cassandra cluster (setting server type, version, etc)
- Launch any number of loaders
- Run cassandra-stress on all loaders and collect the results
- Add nodes to a running cluster
- Stop and starts nodes
- Clean old data from servers before each test
- Collect and display relevant metrics on a Collectd+Graphite+Tessera

 server

Use cases I tested using this framework:
* Stress with multiple loaders
* Out Scale

(adding
server under stress)
* Testing Cassandra Repair


More info in README

the  and Wiki

Some of my future plans include adding YCSB, run on other providers and
more.
Contributions and suggestions will be appreciated!

Tzach


cassandra-stress: Not enough replica available for query at consistency LOCAL_ONE (1 required but only 0 alive)

2015-07-28 Thread Tzach Livyatan
I'm running benchmark on a 2 nodes C* 2.1.8 cluster using cassandra-stress,
with the default of CL =1
Stress runs fine for some time, and than start throwing:

java.io.IOException: Operation x10 on key(s) [36333635504d4b343130]: Error
executing: (UnavailableException): Not enough replica available for query
at consistency LOCAL_ONE (1 required
 but only 0 alive)

at org.apache.cassandra.stress.Operation.error(Operation.java:216)
at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:188)
at
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99)
at
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:107)
at
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:259)
at
org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)

The problem disappears when I decrease the number of client threads, but my
goal is to test max performance, so lowering the bar defeat my purpose.

Is this normal server push back under too much pressure?
shouldn't the stress client slow down before this happened?

Thanks
Tzach


Using TTL in cassandra-stress

2015-07-13 Thread Tzach Livyatan
How do I set TTL for cassandra-stress inserts, either in the profile yaml
file (better) or in the command line?

Thanks
Tzach


Re: Fail to add a node to a cluster - Unknown keyspace system_traces

2015-05-19 Thread Tzach Livyatan
More finding on the problem:
1. the problem present itself when using nodetool status

$ nodetool status
error: Unknown keyspace system_traces
-- StackTrace --
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.init(Keyspace.java:270)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:96)
...

2. the problem disappear when I create an empty keyspace

cqlsh create keyspace temp WITH REPLICATION = { 'class' :
'SimpleStrategy', 'replication_factor' : 2 };

$ nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  OwnsHost ID
Rack
UN  172.31.44.38  118.31 KB  256 ?
e5e97978-b048-46d8-936b-88b544459856  rack1
UN  172.31.44.39  106.48 KB  256 ?
6b021cd9-2fd0-44f4-afea-d01f0b64c45c  rack1

My guess is system_traces initialization complete only after any data
insertion.
Before it does, any attempt to read  from it either from nodetool, cqlsh or
streaming to a new node will fail.


On Mon, May 18, 2015 at 3:33 PM, Tzach Livyatan tz...@cloudius-systems.com
wrote:

 I have a dev cluster of two Cassandra 2.12 servers on EC2
 When adding a new server, I get a
 Streaming error occurred java.lang.AssertionError: Unknown keyspace
 system_traces
 exception on the cluster (not the new) server (full log below).

 Indeed, when I cqlsh to the cluster server, I see the following:
 cqlsh DESCRIBE KEYSPACES;

 system_traces  system

 cqlsh use system_traces;
 code=2200 [Invalid query] message=Keyspace 'system_traces' does not exist

 While
 cqlsh DESCRIBE KEYSPACE system_traces;
 Do works!

 Is it a bug? feature?

 Thanks
 Tzach



 Full log from adding a node:
 INFO  [STREAM-INIT-/172.31.19.130:48054] 2015-05-18 11:36:17,734
 StreamResultFuture.java:109 - [Stream #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5
 ID#0] Creating new streaming plan for Bootst
 rap
 INFO  [STREAM-INIT-/172.31.19.130:48054] 2015-05-18 11:36:17,735
 StreamResultFuture.java:116 - [Stream
 #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5, ID#0] Received streaming plan for
 Bootstrap
 INFO  [STREAM-INIT-/172.31.19.130:48055] 2015-05-18 11:36:17,736
 StreamResultFuture.java:116 - [Stream
 #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5, ID#0] Received streaming plan for
 Bootstrap
 ERROR [STREAM-IN-/172.31.19.130] 2015-05-18 11:36:17,777
 StreamSession.java:472 - [Stream #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5]
 Streaming error occurred
 java.lang.AssertionError: Unknown keyspace system_traces
 at org.apache.cassandra.db.Keyspace.init(Keyspace.java:273)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:122)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at org.apache.cassandra.db.Keyspace.open(Keyspace.java:99)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at
 org.apache.cassandra.streaming.StreamSession.getColumnFamilyStores(StreamSession.java:280)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at
 org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:257)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at
 org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:488)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at
 org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:420)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251)
 ~[apache-cassandra-2.1.2.jar:2.1.2]



Fail to add a node to a cluster - Unknown keyspace system_traces

2015-05-18 Thread Tzach Livyatan
I have a dev cluster of two Cassandra 2.12 servers on EC2
When adding a new server, I get a
Streaming error occurred java.lang.AssertionError: Unknown keyspace
system_traces
exception on the cluster (not the new) server (full log below).

Indeed, when I cqlsh to the cluster server, I see the following:
cqlsh DESCRIBE KEYSPACES;

system_traces  system

cqlsh use system_traces;
code=2200 [Invalid query] message=Keyspace 'system_traces' does not exist

While
cqlsh DESCRIBE KEYSPACE system_traces;
Do works!

Is it a bug? feature?

Thanks
Tzach



Full log from adding a node:
INFO  [STREAM-INIT-/172.31.19.130:48054] 2015-05-18 11:36:17,734
StreamResultFuture.java:109 - [Stream #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5
ID#0] Creating new streaming plan for Bootst
rap
INFO  [STREAM-INIT-/172.31.19.130:48054] 2015-05-18 11:36:17,735
StreamResultFuture.java:116 - [Stream
#18f25dc0-fd52-11e4-a00a-c1b81a3b68f5, ID#0] Received streaming plan for
Bootstrap
INFO  [STREAM-INIT-/172.31.19.130:48055] 2015-05-18 11:36:17,736
StreamResultFuture.java:116 - [Stream
#18f25dc0-fd52-11e4-a00a-c1b81a3b68f5, ID#0] Received streaming plan for
Bootstrap
ERROR [STREAM-IN-/172.31.19.130] 2015-05-18 11:36:17,777
StreamSession.java:472 - [Stream #18f25dc0-fd52-11e4-a00a-c1b81a3b68f5]
Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.init(Keyspace.java:273)
~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:122)
~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:99)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.streaming.StreamSession.getColumnFamilyStores(StreamSession.java:280)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:257)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:488)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:420)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251)
~[apache-cassandra-2.1.2.jar:2.1.2]


cassandra-stressd - cassandra-stress Daemon Mode

2015-04-30 Thread Tzach Livyatan
I'm trying to use cassandra-stressd (daemon Mode)
Following the instruction in this link
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStressDaemon_t.html

Two questions:
1. What does-h host parameter stands for, the daemon server, the C*
server?
I can not find this the parameter in the source.

2. cassandra-stress does not support the -d option to interact with the
daemon.

I'm using Cassandra 2.1.2 on Ubuntu.

Thanks
Tzach


cassandra-stress - confusing documentation

2015-01-21 Thread Tzach Livyatan
Hi all
I'm using cassandra-stress directly from apache-cassandra-2.1.2/tools/bin
The documentation I found
http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.html
is either too old or too advance, but does not match what I use.

In particular, I fail to use the -key populate=1..100 option as used in
the two nodes example from the link above.

#On Node1$ cassandra-stress write tries=20 n=100 cl=one -mode
native cql3 -schema keyspace=Keyspace1 -key populate=1..100 -log
file=~/node1_load.log -node $NODES
 #On Node2$ cassandra-stress write tries=20 n=100 cl=one -mode
native cql3 -schema keyspace=Keyspace1 -key
populate=101..200 -log file=~/node2_load.log -node $NODES

Can some one please direct me to the right doc, or to a valid example of
using populate range?

Thanks
Tzach