token-aware policy doesn't work for token range queries (at least in the
Java driver 3.x). You need to force the driver to do the reading using a
specific token as a routing key. Here is Java implementation of the token
range scanning algorithm that Spark uses:
Yes, use a token-aware policy so the driver will pick a coordinator where
the token (partition) exists. Cheers!
Hi,
I'm going to read all the data in the cluster as fast as possible, i'm aware
that spark could do such things out of the box but just wanted to do it at low
level to see how fast it could be. So:
1. retrieved partition keys on each node using nodetool ring token ranges and
getting distinct
The commitlog defaults to periodic mode, which writes a sync marker to the
file and fsync's the data to disk every 10s by default.
`nodetool flush` will force a sync marker / fsync
Data written since the last fsync will not be replayed on startup and will
be lost.
If you drop the periodic time,
>
> I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
> under heavy read/write traffic, with lots of writes per second. What's
> the worst that could happen, lose a few writes?
>
Nope, you won't lose anything. Snapshots in C* are the equivalent of a cold
backup in relational
That sounds great! Now here's my question:
I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
under heavy read/write traffic, with lots of writes per second. What's
the worst that could happen, lose a few writes?
On 2020-11-10 15:59, Jeff Jirsa wrote:
If you want all of
If you want all of the instances to be consistent with each other, this is
much harder, but if you only want a container that can stop and resume, you
don't have to do anything more than flush + snapshot the storage. The data
files on cassandra should ALWAYS be in a state where the database will
Running Apache Cassandra 3 in Docker. I need to snapshot the storage
volumes. Obviously, I want to be able to re-launch Cassandra from the
snapshots later on. So the snapshots need to be in a consistent state.
With most DBs, the sequence of events is this:
- flush the DB to disk
- "freeze"
Lots of updates to the same rows/columns could theoretically impact read
performance. One way to help counter that would be to use the
LeveledCompactionStrategy to keep the table optimized for reads. It could keep
your nodes busier with compaction – so test it out.
Sean Durity
From: Gábor
Hi,
On Tue, Nov 10, 2020 at 6:29 PM Alex Ott wrote:
> What about using "per partition limit 1" on that table?
>
Oh, it is almost a good solution, but actually the key is ((epoch_day,
name), timestamp), to support more distributed partitioning, so... it is
not good... :/
--
Bye,
Auth Gábor
What about using "per partition limit 1" on that table?
On Tue, Nov 10, 2020 at 8:39 AM Gábor Auth wrote:
> Hi,
>
> Short story: storing time series of measurements (key(name, timestamp),
> value).
>
> The problem: get the list of the last `value` of every `name`.
>
> Is there a Cassandra
Hi,
On Tue, Nov 10, 2020 at 5:29 PM Durity, Sean R
wrote:
> Updates do not create tombstones. Deletes create tombstones. The above
> scenario would not create any tombstones. For a full solution, though, I
> would probably suggest a TTL on the data so that old/unchanged data
> eventually gets
Hi,
On Tue, Nov 10, 2020 at 3:18 PM Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
My answer would depend on how many “names” you expect. If it is a relatively
small and constrained list (under a few hundred thousand), I would start with
something like:
At the moment, the number
Hi,
On Tue, Nov 10, 2020 at 3:18 PM Durity, Sean R
wrote:
> My answer would depend on how many “names” you expect. If it is a
> relatively small and constrained list (under a few hundred thousand), I
> would start with something like:
>
At the moment, the number of names is more than 10,000
My answer would depend on how many “names” you expect. If it is a relatively
small and constrained list (under a few hundred thousand), I would start with
something like:
Create table last_values (
arbitrary_partition text, -- use an app name or something static to define the
partition
name
15 matches
Mail list logo