Re: [EXTERNAL] Re: cold vs hot data

2018-09-19 Thread Alaa Zubaidi (PDF)
This is one of the options that we are thinking of, but this will require
more storage, which is something that we are trying to avoid.
We will test the performance for the batch inserts.
Thanks

On Tue, Sep 18, 2018 at 6:35 AM, Durity, Sean R  wrote:

> Wouldn’t you have the same problem with two similar tables with different
> primary keys (eg., UserByID and UserByName)? This is a very common pattern
> in Cassandra – inserting into multiple tables… That’s what batches are for
> – atomicity.
>
> I don’t understand the additional concern here.
>
>
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* DuyHai Doan 
> *Sent:* Monday, September 17, 2018 4:23 PM
> *To:* user 
> *Subject:* Re: [EXTERNAL] Re: cold vs hot data
>
>
>
> Sean
>
>
>
> Without transactions à la SQL, how can you guarantee atomicity between
> both tables for upserts ? I mean, one write could succeed with hot table
> and fail for cold table
>
>
>
> The only solution I see is using logged batch, with a huge overhead and
> perf hit on for the writes
>
>
>
> On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
> An idea:
>
> On initial insert, insert into 2 tables:
> Hot with short TTL
> Cold/archive with a longer (or no) TTL
> Then your hot data is always in the same table, but being expired. And you
> can access the archive table only for the more rare circumstances. Then you
> could have the HOT table on a different volume of faster storage. If the
> hot/cold tables are in different keyspaces, then you could also have
> different replication (a HOT DC and an archive DC, for example)
>
>
> Sean Durity
>
>
>
> -Original Message-
> From: Mateusz 
> Sent: Friday, September 14, 2018 2:40 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: cold vs hot data
>
> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> > The data can grow to +100TB however the hot data will be in most cases
> > less than 10TB but we still need to keep the rest of data accessible.
> > Anyone has this problem?
> > What is the best way to make the cluster more efficient?
> > Is there a way to somehow automatically move the old data to different
> > storage (rack, dc, etc)?
> > Any ideas?
>
> We solved it using lvmcache.
>
> --
> Mateusz
> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
> krótko mówiąc - podpora społeczeństwa."
> Nikos Kazantzakis - "Grek Zorba"
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and li

RE: [EXTERNAL] Re: cold vs hot data

2018-09-18 Thread Durity, Sean R
Wouldn’t you have the same problem with two similar tables with different 
primary keys (eg., UserByID and UserByName)? This is a very common pattern in 
Cassandra – inserting into multiple tables… That’s what batches are for – 
atomicity.
I don’t understand the additional concern here.



Sean Durity

From: DuyHai Doan 
Sent: Monday, September 17, 2018 4:23 PM
To: user 
Subject: Re: [EXTERNAL] Re: cold vs hot data

Sean

Without transactions à la SQL, how can you guarantee atomicity between both 
tables for upserts ? I mean, one write could succeed with hot table and fail 
for cold table

The only solution I see is using logged batch, with a huge overhead and perf 
hit on for the writes

On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
An idea:

On initial insert, insert into 2 tables:
Hot with short TTL
Cold/archive with a longer (or no) TTL
Then your hot data is always in the same table, but being expired. And you can 
access the archive table only for the more rare circumstances. Then you could 
have the HOT table on a different volume of faster storage. If the hot/cold 
tables are in different keyspaces, then you could also have different 
replication (a HOT DC and an archive DC, for example)


Sean Durity


-Original Message-
From: Mateusz 
mailto:mateusz-li...@ant.gliwice.pl>>
Sent: Friday, September 14, 2018 2:40 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: cold vs hot data

On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> The data can grow to +100TB however the hot data will be in most cases
> less than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?

We solved it using lvmcache.

--
Mateusz
(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa."
Nikos Kazantzakis - "Grek Zorba"




-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Also for the record, I remember Datastax having something called Tiered
Storage that does move data around (folders/disk volume) based on data age.
To be checked

On Mon, Sep 17, 2018 at 10:23 PM, DuyHai Doan  wrote:

> Sean
>
> Without transactions à la SQL, how can you guarantee atomicity between
> both tables for upserts ? I mean, one write could succeed with hot table
> and fail for cold table
>
> The only solution I see is using logged batch, with a huge overhead and
> perf hit on for the writes
>
> On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> An idea:
>>
>> On initial insert, insert into 2 tables:
>> Hot with short TTL
>> Cold/archive with a longer (or no) TTL
>> Then your hot data is always in the same table, but being expired. And
>> you can access the archive table only for the more rare circumstances. Then
>> you could have the HOT table on a different volume of faster storage. If
>> the hot/cold tables are in different keyspaces, then you could also have
>> different replication (a HOT DC and an archive DC, for example)
>>
>>
>> Sean Durity
>>
>>
>> -Original Message-
>> From: Mateusz 
>> Sent: Friday, September 14, 2018 2:40 AM
>> To: user@cassandra.apache.org
>> Subject: [EXTERNAL] Re: cold vs hot data
>>
>> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
>> > The data can grow to +100TB however the hot data will be in most cases
>> > less than 10TB but we still need to keep the rest of data accessible.
>> > Anyone has this problem?
>> > What is the best way to make the cluster more efficient?
>> > Is there a way to somehow automatically move the old data to different
>> > storage (rack, dc, etc)?
>> > Any ideas?
>>
>> We solved it using lvmcache.
>>
>> --
>> Mateusz
>> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
>> krótko mówiąc - podpora społeczeństwa."
>> Nikos Kazantzakis - "Grek Zorba"
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> 
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
>


Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Sean

Without transactions à la SQL, how can you guarantee atomicity between both
tables for upserts ? I mean, one write could succeed with hot table and
fail for cold table

The only solution I see is using logged batch, with a huge overhead and
perf hit on for the writes

On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R  wrote:

> An idea:
>
> On initial insert, insert into 2 tables:
> Hot with short TTL
> Cold/archive with a longer (or no) TTL
> Then your hot data is always in the same table, but being expired. And you
> can access the archive table only for the more rare circumstances. Then you
> could have the HOT table on a different volume of faster storage. If the
> hot/cold tables are in different keyspaces, then you could also have
> different replication (a HOT DC and an archive DC, for example)
>
>
> Sean Durity
>
>
> -Original Message-
> From: Mateusz 
> Sent: Friday, September 14, 2018 2:40 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: cold vs hot data
>
> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> > The data can grow to +100TB however the hot data will be in most cases
> > less than 10TB but we still need to keep the rest of data accessible.
> > Anyone has this problem?
> > What is the best way to make the cluster more efficient?
> > Is there a way to somehow automatically move the old data to different
> > storage (rack, dc, etc)?
> > Any ideas?
>
> We solved it using lvmcache.
>
> --
> Mateusz
> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
> krótko mówiąc - podpora społeczeństwa."
> Nikos Kazantzakis - "Grek Zorba"
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


RE: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread Durity, Sean R
An idea:

On initial insert, insert into 2 tables:
Hot with short TTL
Cold/archive with a longer (or no) TTL
Then your hot data is always in the same table, but being expired. And you can 
access the archive table only for the more rare circumstances. Then you could 
have the HOT table on a different volume of faster storage. If the hot/cold 
tables are in different keyspaces, then you could also have different 
replication (a HOT DC and an archive DC, for example)


Sean Durity


-Original Message-
From: Mateusz 
Sent: Friday, September 14, 2018 2:40 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: cold vs hot data

On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> The data can grow to +100TB however the hot data will be in most cases
> less than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?

We solved it using lvmcache.

--
Mateusz
(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa."
Nikos Kazantzakis - "Grek Zorba"




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org