Re: IO scheduler for SSDs on EC2?

2015-03-15 Thread Dor Laor
On Sun, Mar 15, 2015 at 2:03 PM, Ali Akhtar ali.rac...@gmail.com wrote: I was watching a talk recently on Elasticsearch performance in EC2, and they recommended setting the IO scheduler to noop for SSDs. Is that the case for Cassandra as well, or is it recommended to keep the default

Re: How many vnodes should I use for each node in my cluster?

2016-09-16 Thread Dor Laor
On Fri, Sep 16, 2016 at 11:29 AM, Li, Guangxing wrote: > Hi, > > I have a 3 nodes cluster, each with less than 200 GB data. Currently all > nodes have the default 256 value for num_tokens. My colleague told me that > with the data size I have (less than 200 GB on each

Re: scylladb

2017-03-14 Thread Dor Laor
On Tue, Mar 14, 2017 at 7:43 AM, Eric Evans wrote: > On Sun, Mar 12, 2017 at 4:01 PM, James Carman > wrote: > > Does all of this Scylla talk really even belong on the Cassandra user > > mailing list in the first place? > > I personally

Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa wrote: > > > On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote: > > Cassanda vs Scylla is a valid comparison because they both are > compatible. Scylla is a drop-in replacement for Cassandra. > > No, they aren't, and no, it isn't >

Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali wrote: > My response is inline. > > On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity wrote: > >> There are several issues at play here. >> >> First, a database runs a large number of concurrent operations, each of >>

Re: scylladb

2017-03-12 Thread Dor Laor
is is >> very misleading. The marketing material should really say something like >> "drop in replacement for some workloads" or "aims to be a drop in >> replacement". As is, it doesn't support everything, so it's not a drop in. >> >> >>

Re: scylladb

2017-03-12 Thread Dor Laor
On Sun, Mar 12, 2017 at 6:40 AM, Stefan Podkowinski wrote: > If someone would create a benchmark showing that Cassandra is 10x faster > than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB? > > Joking aside, I personally don't pay a lot of attention to any

Re: scylladb

2017-03-12 Thread Dor Laor
based solution. > > On Sat, Mar 11, 2017 at 10:34 PM Dor Laor <d...@scylladb.com> wrote: > >> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> >> >> >> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote: >> &

Re: scylladb

2017-03-13 Thread Dor Laor
further questions they are welcome to ask on our mailing list or privately. Cheers, Dor On Mon, Mar 13, 2017 at 12:43 AM, Dor Laor <d...@scylladb.com> wrote: > On Mon, Mar 13, 2017 at 12:17 AM, benjamin roth <brs...@gmail.com> wrote: > >> @Dor,Jeff: >> >> I thin

Re: scylladb

2017-03-13 Thread Dor Laor
ol, IMHO. >>> >>> >>> On Sun, Mar 12, 2017 at 5:04 PM Kant Kodali <k...@peernova.com> wrote: >>> >>> yes. >>> >>> On Sun, Mar 12, 2017 at 2:01 PM, James Carman < >>> ja...@carmanconsulting.com> wrote: >>> >>> Does all of th

Re: scylladb

2017-03-10 Thread Dor Laor
Scylla isn't just about performance too. First, a disclaimer, I am a Scylla co-founder. I respect open source a lot, so you guys are welcome to shush me out of this thread. I only participate to provide value if I can (this is a thread about Scylla and our users are on our mailing list). Scylla

Re: scylladb

2017-03-10 Thread Dor Laor
async engine is ideal for the larger number of round trips the LWT needs. This is with the Linux tcp stack, once we'll use our dpdk one, performance will improve further ;) > > On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor <d...@scylladb.com> wrote: > >> Scylla isn't just about per

Re: Full table scan with cassandra

2017-08-16 Thread Dor Laor
Hi Alex, You probably didn't get the paralelism right. Serial scan has a paralelism of one. If the paralelism isn't large enough, perf will be slow. If paralelism is too large, Cassandra and the disk will trash and have too many context switches. So you need to find your cluster's sweet spot. We

Re: Full table scan with cassandra

2017-08-17 Thread Dor Laor
is > not the bottleneck. It is not. > > I expected some kind of elasticity, I see none. Feels like I do something > wrong... > > > > On 17 August 2017 at 00:19, Dor Laor <d...@scylladb.com> wrote: > >> Hi Alex, >> >> You probably didn't get the p

Re: Bootstraping a Node With a Newer Version

2017-05-17 Thread Dor Laor
We've done such in-place upgrade in the past but not for a real production. However you're MISSING the point. The root filesystem along with the entire OS should be completely separated from your data directories. It should reside in a different logical volume and thus you can easily change the

Re: EC2 instance recommendations

2017-05-23 Thread Dor Laor
Note that EBS durability isn't perfect, you cannot rely on them entirely: https://aws.amazon.com/ebs/details/ "Amazon EBS volumes are designed for an annual failure rate (AFR) of between 0.1% - 0.2%, where failure refers to a complete or partial loss of the volume, depending on the size and

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
Make sure you pick instances with PCID cpu capability, their TLB overhead flush overhead is much smaller On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Quick follow up. > > > > Others in AWS reporting/seeing something similar, e.g.: >

Re: Too many open files

2018-01-22 Thread Dor Laor
It's a high number, your compaction may run behind and thus many small sstables exist. However, you're also taking the number of network connection in the calculation (everything in *nix is a file). If it makes you feel better my laptop has 40k open files for Chrome.. On Sun, Jan 21, 2018 at

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
Tue, Jan 9, 2018 at 11:19 PM, daemeon reiydelle <daeme...@gmail.com> wrote: > Good luck with that. Pcid out since mid 2017 as I recall? > > > Daemeon (Dæmœn) Reiydelle > USA 1.415.501.0198 <(415)%20501-0198> > > On Jan 9, 2018 10:31 AM, "Dor Laor" <

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Dor Laor
I think you're introducing a layer violation. GDPR is a business requirement and compaction is an implementation detail. IMHO it's enough to delete the partition using regular CQL. It's true that it won't be deleted immedietly but it will be eventually deleted (welcome to eventual consistency ;).

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
r tends to >> be to use LWT/CAS to guarantee state if you have a data model where it >> matters. >> >> >> -- >> Jeff Jirsa >> >> >> On Mar 8, 2018, at 6:18 PM, Dor Laor <d...@scylladb.com> wrote: >> >> While NTP on the servers is

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
While NTP on the servers is important, make sure that you use client timestamps and not server. Since the last write wins, the data generator should be the one setting its timestamp. On Thu, Mar 8, 2018 at 2:12 PM, Ben Slater wrote: > It is important to make sure you

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
model where it > matters. > > -- > Jeff Jirsa > > > On Mar 8, 2018, at 6:18 PM, Dor Laor <d...@scylladb.com> wrote: > > While NTP on the servers is important, make sure that you use client > timestamps and > not server. Since the last write wins, the data

Re: Cassandra Splitting databases

2019-01-06 Thread Dor Laor
it with a test dataset until you are confidence about the commands and their outcome. This example should work with Cassandra: https://www.scylladb.com/2018/03/28/mms-day7-multidatacenter-consistency/ On Sat, Jan 5, 2019 at 5:57 AM R1 J1 wrote: > Dor Laor, > I like your approach. If I re

Re: Migrating from DSE5.1.2 to Opensource cassandra

2018-12-05 Thread Dor Laor
An alternative approach is to form another new cluster, leave the original cluster alive (many times it's a must since it needs to be 24x7 online). Double write to the two clusters and later migrate the data to it. Either by taking a snapshot and pass those files to the new cluster or with

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Dor Laor
altime workload for isolation and low latency guarantees. We addressed this problem elsewhere, beyond this scope. > > > > Sean Durity > > > > *From:* Dor Laor > *Sent:* Friday, January 04, 2019 4:21 PM > *To:* user@cassandra.apache.org > *Subject:* [EX

Re: Cassandra Splitting databases

2019-01-04 Thread Dor Laor
Not sure I understand correctly but if you have one cluster with 2 separate datacenters you can define keyspace A to be on premise with a single DC and keyspace B only on Azure. On Fri, Jan 4, 2019 at 2:23 PM R1 J1 wrote: > We currently have 2 databases (A and B ) on a 6 node cluster. > 3

Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-04 Thread Dor Laor
I strongly recommend option B, separate clusters. Reasons: - Networking of node-node is negligible compared to networking within the node - Different scaling considerations Your workload may require 10 Spark nodes and 20 database nodes, so why bundle them? This ratio may also change over

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread Dor Laor
The DynamoDB model has several key benefits over Cassandra's. The most notable one is the tablet concept - data is partitioned into 10GB chunks. So scaling happens where such a tablet reaches maximum capacity and it is automatically divided to two. It can happen in parallel across the entire data

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-10 Thread Dor Laor
so benefitting from the decompression. However I’ve started to wonder >> how often sstable compression is worth the performance drag and internal C* >> complexity. If you compare to where a more traditional RDBMS would use >> compression, e.g. Postgres, use of compression is more se

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Dor Laor
Another option is to use the Spark migrator, it reads a source CQL cluster and writes to another. It has a validation stage that compares a full scan and reports the diff: https://github.com/scylladb/scylla-migrator There are many more ways to clone a cluster. My main recommendation is to

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Dor Laor
Another option instead of raw sstables is to use the Spark Migrator [1]. It reads a source cluster, can make some transformations (like table/column naming) and writes to a target cluster. It's a very convenient tool, OSS and free of charge. [1] https://github.com/scylladb/scylla-migrator On

Re: Disabling Swap for Cassandra

2020-04-16 Thread Dor Laor
-lock-the-pages-of-a-process-in-memory > > > Thanks > Kunal > > On Thu, Apr 16, 2020 at 4:31 PM Dor Laor wrote: >> >> It is good to configure swap for the OS but exempt Cassandra >> from swapping. Why is it good? Since you never know the >> memory utilizat

Re: Disabling Swap for Cassandra

2020-04-16 Thread Dor Laor
It is good to configure swap for the OS but exempt Cassandra from swapping. Why is it good? Since you never know the memory utilization of additional agents and processes you or other admins will run on your server. So do configure a swap partition. You can control the eagerness of the kernel by

Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Dor Laor
In your schema case, for each client_id you will get a single 'when' row. Just one. Even when there are multiple rows (clustering keys) On Thu, May 7, 2020 at 12:14 AM Check Peck wrote: > > I have a scylla table as shown below: > > > cqlsh:sampleks> describe table test; > > > CREATE

Re: CDC Tools

2020-05-27 Thread Dor Laor
If it's helpful, IMO, the approach Cassandra needs to take isn't by tracking the individual node commit log and putting the burden on the client. At Scylla, we had the 'opportunity' to be a late comer and see what approach Cassadnra took and what DynamoDB streams took. We've implemented CDC as a

Re: about the performance of select * from tbl

2022-04-26 Thread Dor Laor
select * reads all of the data from the cluster, obviously it would be bad if you'll run a single query and expect it to return 'fast'. The best way is to divide the data set into chunks which will be selected by the range ownership per node, so you'll be able to query in parallel the entire

Re: Send large blobs

2022-05-31 Thread Dor Laor
On Tue, May 31, 2022 at 4:40 PM Andria Trigeorgi wrote: > Hi, > > I want to write large blobs in Cassandra. However, when I tried to write > more than a 256MB blob, I got the message: > "Error from server: code=2200 [Invalid query] message=\"Request is too > big: length 268435580 exceeds maximum