On Sun, Mar 15, 2015 at 2:03 PM, Ali Akhtar ali.rac...@gmail.com wrote:
I was watching a talk recently on Elasticsearch performance in EC2, and
they recommended setting the IO scheduler to noop for SSDs. Is that the
case for Cassandra as well, or is it recommended to keep the default
On Fri, Sep 16, 2016 at 11:29 AM, Li, Guangxing
wrote:
> Hi,
>
> I have a 3 nodes cluster, each with less than 200 GB data. Currently all
> nodes have the default 256 value for num_tokens. My colleague told me that
> with the data size I have (less than 200 GB on each
On Tue, Mar 14, 2017 at 7:43 AM, Eric Evans
wrote:
> On Sun, Mar 12, 2017 at 4:01 PM, James Carman
> wrote:
> > Does all of this Scylla talk really even belong on the Cassandra user
> > mailing list in the first place?
>
> I personally
On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa wrote:
>
>
> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
> > Cassanda vs Scylla is a valid comparison because they both are
> compatible. Scylla is a drop-in replacement for Cassandra.
>
> No, they aren't, and no, it isn't
>
On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali wrote:
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>>
is is
>> very misleading. The marketing material should really say something like
>> "drop in replacement for some workloads" or "aims to be a drop in
>> replacement". As is, it doesn't support everything, so it's not a drop in.
>>
>>
>>
On Sun, Mar 12, 2017 at 6:40 AM, Stefan Podkowinski wrote:
> If someone would create a benchmark showing that Cassandra is 10x faster
> than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB?
>
> Joking aside, I personally don't pay a lot of attention to any
based solution.
>
> On Sat, Mar 11, 2017 at 10:34 PM Dor Laor <d...@scylladb.com> wrote:
>
>> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>
>>
>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>> &
further questions they are welcome to
ask on our mailing list or privately.
Cheers,
Dor
On Mon, Mar 13, 2017 at 12:43 AM, Dor Laor <d...@scylladb.com> wrote:
> On Mon, Mar 13, 2017 at 12:17 AM, benjamin roth <brs...@gmail.com> wrote:
>
>> @Dor,Jeff:
>>
>> I thin
ol, IMHO.
>>>
>>>
>>> On Sun, Mar 12, 2017 at 5:04 PM Kant Kodali <k...@peernova.com> wrote:
>>>
>>> yes.
>>>
>>> On Sun, Mar 12, 2017 at 2:01 PM, James Carman <
>>> ja...@carmanconsulting.com> wrote:
>>>
>>> Does all of th
Scylla isn't just about performance too.
First, a disclaimer, I am a Scylla co-founder. I respect open source a lot,
so you guys are welcome to shush me out of this thread. I only participate
to provide value if I can (this is a thread about Scylla and our users are
on our mailing list).
Scylla
async engine is ideal
for the larger number
of round trips the LWT needs.
This is with the Linux tcp stack, once we'll use our dpdk one, performance
will improve further ;)
>
> On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor <d...@scylladb.com> wrote:
>
>> Scylla isn't just about per
Hi Alex,
You probably didn't get the paralelism right. Serial scan has
a paralelism of one. If the paralelism isn't large enough, perf will be
slow.
If paralelism is too large, Cassandra and the disk will trash and have too
many context switches.
So you need to find your cluster's sweet spot. We
is
> not the bottleneck. It is not.
>
> I expected some kind of elasticity, I see none. Feels like I do something
> wrong...
>
>
>
> On 17 August 2017 at 00:19, Dor Laor <d...@scylladb.com> wrote:
>
>> Hi Alex,
>>
>> You probably didn't get the p
We've done such in-place upgrade in the past but not for a real production.
However you're MISSING the point. The root filesystem along with the entire
OS should be completely separated from your data directories. It should
reside
in a different logical volume and thus you can easily change the
Note that EBS durability isn't perfect, you cannot rely on them entirely:
https://aws.amazon.com/ebs/details/
"Amazon EBS volumes are designed for an annual failure rate (AFR) of
between 0.1% - 0.2%, where failure refers to a complete or partial loss of
the volume, depending on the size and
Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller
On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
> Quick follow up.
>
>
>
> Others in AWS reporting/seeing something similar, e.g.:
>
It's a high number, your compaction may run behind and thus
many small sstables exist. However, you're also taking the
number of network connection in the calculation (everything
in *nix is a file). If it makes you feel better my laptop
has 40k open files for Chrome..
On Sun, Jan 21, 2018 at
Tue, Jan 9, 2018 at 11:19 PM, daemeon reiydelle <daeme...@gmail.com>
wrote:
> Good luck with that. Pcid out since mid 2017 as I recall?
>
>
> Daemeon (Dæmœn) Reiydelle
> USA 1.415.501.0198 <(415)%20501-0198>
>
> On Jan 9, 2018 10:31 AM, "Dor Laor" <
I think you're introducing a layer violation. GDPR is a business
requirement and
compaction is an implementation detail.
IMHO it's enough to delete the partition using regular CQL.
It's true that it won't be deleted immedietly but it will be eventually
deleted (welcome to eventual consistency ;).
r tends to
>> be to use LWT/CAS to guarantee state if you have a data model where it
>> matters.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Mar 8, 2018, at 6:18 PM, Dor Laor <d...@scylladb.com> wrote:
>>
>> While NTP on the servers is
While NTP on the servers is important, make sure that you use client
timestamps and
not server. Since the last write wins, the data generator should be the one
setting its timestamp.
On Thu, Mar 8, 2018 at 2:12 PM, Ben Slater
wrote:
> It is important to make sure you
model where it
> matters.
>
> --
> Jeff Jirsa
>
>
> On Mar 8, 2018, at 6:18 PM, Dor Laor <d...@scylladb.com> wrote:
>
> While NTP on the servers is important, make sure that you use client
> timestamps and
> not server. Since the last write wins, the data
it with a test dataset until you are confidence about the
commands and their outcome. This example should work with Cassandra:
https://www.scylladb.com/2018/03/28/mms-day7-multidatacenter-consistency/
On Sat, Jan 5, 2019 at 5:57 AM R1 J1 wrote:
> Dor Laor,
> I like your approach. If I re
An alternative approach is to form another new cluster, leave the original
cluster alive (many times
it's a must since it needs to be 24x7 online). Double write to the two
clusters and later migrate the
data to it. Either by taking a snapshot and pass those files to the new
cluster or with
altime workload for
isolation and low latency guarantees.
We addressed this problem elsewhere, beyond this scope.
>
>
>
> Sean Durity
>
>
>
> *From:* Dor Laor
> *Sent:* Friday, January 04, 2019 4:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EX
Not sure I understand correctly but if you have one cluster with 2 separate
datacenters
you can define keyspace A to be on premise with a single DC and keyspace B
only on Azure.
On Fri, Jan 4, 2019 at 2:23 PM R1 J1 wrote:
> We currently have 2 databases (A and B ) on a 6 node cluster.
> 3
I strongly recommend option B, separate clusters. Reasons:
- Networking of node-node is negligible compared to networking within the
node
- Different scaling considerations
Your workload may require 10 Spark nodes and 20 database nodes, so why
bundle them?
This ratio may also change over
The DynamoDB model has several key benefits over Cassandra's.
The most notable one is the tablet concept - data is partitioned into 10GB
chunks. So scaling happens where such a tablet reaches maximum capacity
and it is automatically divided to two. It can happen in parallel across
the entire
data
so benefitting from the decompression. However I’ve started to wonder
>> how often sstable compression is worth the performance drag and internal C*
>> complexity. If you compare to where a more traditional RDBMS would use
>> compression, e.g. Postgres, use of compression is more se
Another option is to use the Spark migrator, it reads a source CQL cluster and
writes to another. It has a validation stage that compares a full scan
and reports the diff:
https://github.com/scylladb/scylla-migrator
There are many more ways to clone a cluster. My main recommendation is
to
Another option instead of raw sstables is to use the Spark Migrator [1].
It reads a source cluster, can make some transformations (like
table/column naming) and
writes to a target cluster. It's a very convenient tool, OSS and free of charge.
[1] https://github.com/scylladb/scylla-migrator
On
-lock-the-pages-of-a-process-in-memory
>
>
> Thanks
> Kunal
>
> On Thu, Apr 16, 2020 at 4:31 PM Dor Laor wrote:
>>
>> It is good to configure swap for the OS but exempt Cassandra
>> from swapping. Why is it good? Since you never know the
>> memory utilizat
It is good to configure swap for the OS but exempt Cassandra
from swapping. Why is it good? Since you never know the
memory utilization of additional agents and processes you or
other admins will run on your server.
So do configure a swap partition.
You can control the eagerness of the kernel by
In your schema case, for each client_id you will get a single 'when'
row. Just one. Even when there are multiple rows (clustering keys)
On Thu, May 7, 2020 at 12:14 AM Check Peck wrote:
>
> I have a scylla table as shown below:
>
>
> cqlsh:sampleks> describe table test;
>
>
> CREATE
If it's helpful, IMO, the approach Cassandra needs to take isn't
by tracking the individual node commit log and putting the burden
on the client. At Scylla, we had the 'opportunity' to be a late comer
and see what approach Cassadnra took and what DynamoDB streams
took.
We've implemented CDC as a
select * reads all of the data from the cluster, obviously it would be bad
if you'll
run a single query and expect it to return 'fast'. The best way is to
divide the data
set into chunks which will be selected by the range ownership per node, so
you'll
be able to query in parallel the entire
On Tue, May 31, 2022 at 4:40 PM Andria Trigeorgi
wrote:
> Hi,
>
> I want to write large blobs in Cassandra. However, when I tried to write
> more than a 256MB blob, I got the message:
> "Error from server: code=2200 [Invalid query] message=\"Request is too
> big: length 268435580 exceeds maximum
38 matches
Mail list logo