RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R
Lots of updates to the same rows/columns could theoretically impact read 
performance. One way to help counter that would be to use the 
LeveledCompactionStrategy to keep the table optimized for reads. It could keep 
your nodes busier with compaction – so test it out.


Sean Durity

From: Gábor Auth 
Sent: Tuesday, November 10, 2020 11:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Last stored value metadata table

Hi,

On Tue, Nov 10, 2020 at 5:29 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
Updates do not create tombstones. Deletes create tombstones. The above scenario 
would not create any tombstones. For a full solution, though, I would probably 
suggest a TTL on the data so that old/unchanged data eventually gets removed 
(if that is desirable). TTLs can create tombstones, but should not be a major 
problem if expired data is relatively infrequent.

Okay, there are no tombstones (I misused the term) but every updated `value` 
are sitting in the memory and on the disk before the next compaction... Does it 
degrade the read performance?

--
Bye,
Auth Gábor (https://iotguru.cloud 
[iotguru.cloud]<https://urldefense.com/v3/__https:/iotguru.cloud__;!!M-nmYVHPHQ!cnmro4EqZM3gNHz8GNmzIFDZ29hTfdoqwoZbnVG07wpOhi2hoTNm7PeAyBGDvj0uEIfFPUA$>)



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Last stored value metadata table

2020-11-10 Thread Gábor Auth
Hi,

On Tue, Nov 10, 2020 at 6:29 PM Alex Ott  wrote:

> What about using  "per partition limit 1" on that table?
>

Oh, it is almost a good solution, but actually the key is ((epoch_day,
name), timestamp), to support more distributed partitioning, so... it is
not good... :/

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


Re: Last stored value metadata table

2020-11-10 Thread Alex Ott
What about using  "per partition limit 1" on that table?

On Tue, Nov 10, 2020 at 8:39 AM Gábor Auth  wrote:

> Hi,
>
> Short story: storing time series of measurements (key(name, timestamp),
> value).
>
> The problem: get the list of the last `value` of every `name`.
>
> Is there a Cassandra friendly solution to store the last value of every
> `name` in a separate metadata table? It will come with a lot of
> tombstones... any other solution? :)
>
> --
> Bye,
> Auth Gábor
>


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


Re: Last stored value metadata table

2020-11-10 Thread Gábor Auth
Hi,

On Tue, Nov 10, 2020 at 5:29 PM Durity, Sean R 
wrote:

> Updates do not create tombstones. Deletes create tombstones. The above
> scenario would not create any tombstones. For a full solution, though, I
> would probably suggest a TTL on the data so that old/unchanged data
> eventually gets removed (if that is desirable). TTLs can create tombstones,
> but should not be a major problem if expired data is relatively infrequent.
>

Okay, there are no tombstones (I misused the term) but every updated
`value` are sitting in the memory and on the disk before the next
compaction... Does it degrade the read performance?

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R

Hi,

On Tue, Nov 10, 2020 at 3:18 PM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
My answer would depend on how many “names” you expect. If it is a relatively 
small and constrained list (under a few hundred thousand), I would start with 
something like:

At the moment, the number of names is more than 10,000 but not than 100,000.

Create table last_values (
arbitrary_partition text, -- use an app name or something static to define the 
partition
name text,
value text,
last_upd_ts timestamp,
primary key (arbitrary_partition, name);

What is the purpose of the partition key?

--- This keeps the data in one partition so that you can retrieve all of it in 
one query (as you requested). If the partition key is just “name,” then you 
would need a query for each name:
select value, last_upd_ts from last_values where name = ‘name1’; //10,000+ 
queries and you have to know all the names

Since it is a single partition, you want to keep the partition size under 100 
MB (rule of thumb). That is why knowing the size/bounds of the data is 
important.

(NOTE: every insert would just overwrite the last value. You only keep the last 
one.)

This is the behavior that I want. :)

I’m assuming that your data arrives in time series order, so that it is easy to 
just insert the last value into last_values. If you have to read before write, 
that would be a Cassandra anti-pattern that needs a different solution. (Based 
on how regular the data points are, I would look at something time-series 
related with a short TTL.)

Okay, but as I know, this is the scenario when every update of the 
`last_values` generates two tombstones because of the update of the `value` and 
`last_upd_ts` field. Maybe I know it wrong?

--- Updates do not create tombstones. Deletes create tombstones. The above 
scenario would not create any tombstones. For a full solution, though, I would 
probably suggest a TTL on the data so that old/unchanged data eventually gets 
removed (if that is desirable). TTLs can create tombstones, but should not be a 
major problem if expired data is relatively infrequent.


--
Bye,
Auth Gábor (https://iotguru.cloud 
[iotguru.cloud])



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Last stored value metadata table

2020-11-10 Thread Gábor Auth
Hi,

On Tue, Nov 10, 2020 at 3:18 PM Durity, Sean R 
wrote:

> My answer would depend on how many “names” you expect. If it is a
> relatively small and constrained list (under a few hundred thousand), I
> would start with something like:
>

At the moment, the number of names is more than 10,000 but not than 100,000.

>
> Create table last_values (
>
> arbitrary_partition text, -- use an app name or something static to define
> the partition
>
> name text,
>
> value text,
>
> last_upd_ts timestamp,
>
> primary key (arbitrary_partition, name);
>

What is the purpose of the partition key?

(NOTE: every insert would just overwrite the last value. You only keep the
> last one.)
>

This is the behavior that I want. :)


> I’m assuming that your data arrives in time series order, so that it is
> easy to just insert the last value into last_values. If you have to read
> before write, that would be a Cassandra anti-pattern that needs a different
> solution. (Based on how regular the data points are, I would look at
> something time-series related with a short TTL.)
>

Okay, but as I know, this is the scenario when every update of the
`last_values` generates two tombstones because of the update of the `value`
and `last_upd_ts` field. Maybe I know it wrong?

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


RE: Last stored value metadata table

2020-11-10 Thread Durity, Sean R
My answer would depend on how many “names” you expect. If it is a relatively 
small and constrained list (under a few hundred thousand), I would start with 
something like:

Create table last_values (
arbitrary_partition text, -- use an app name or something static to define the 
partition
name text,
value text,
last_upd_ts timestamp,
primary key (arbitrary_partition, name);

(NOTE: every insert would just overwrite the last value. You only keep the last 
one.)

Then your query is easy:
Select name, value, last_upd_ts from last_values where arbitrary_partition = 
‘my_app_name’;

If the list of names is unbounded/large, then I would be asking, does the query 
really need every name/value pair? What other way could they grouped together 
in a reasonable partition? I would use that instead of the arbitrary_partition 
above and run multiple queries (one for each partition) if a massive list is 
actually required.

I’m assuming that your data arrives in time series order, so that it is easy to 
just insert the last value into last_values. If you have to read before write, 
that would be a Cassandra anti-pattern that needs a different solution. (Based 
on how regular the data points are, I would look at something time-series 
related with a short TTL.)


Sean Durity

From: Gábor Auth 
Sent: Tuesday, November 10, 2020 2:39 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Last stored value metadata table

Hi,

Short story: storing time series of measurements (key(name, timestamp), value).

The problem: get the list of the last `value` of every `name`.

Is there a Cassandra friendly solution to store the last value of every `name` 
in a separate metadata table? It will come with a lot of tombstones... any 
other solution? :)

--
Bye,
Auth Gábor



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Last stored value metadata table

2020-11-09 Thread Gábor Auth
Hi,

Short story: storing time series of measurements (key(name, timestamp),
value).

The problem: get the list of the last `value` of every `name`.

Is there a Cassandra friendly solution to store the last value of every
`name` in a separate metadata table? It will come with a lot of
tombstones... any other solution? :)

-- 
Bye,
Auth Gábor