Re: [EXTERNAL] Re: cold vs hot data
This is one of the options that we are thinking of, but this will require more storage, which is something that we are trying to avoid. We will test the performance for the batch inserts. Thanks On Tue, Sep 18, 2018 at 6:35 AM, Durity, Sean R wrote: > Wouldn’t you have the same problem with two similar tables with different > primary keys (eg., UserByID and UserByName)? This is a very common pattern > in Cassandra – inserting into multiple tables… That’s what batches are for > – atomicity. > > I don’t understand the additional concern here. > > > > > > > > Sean Durity > > > > *From:* DuyHai Doan > *Sent:* Monday, September 17, 2018 4:23 PM > *To:* user > *Subject:* Re: [EXTERNAL] Re: cold vs hot data > > > > Sean > > > > Without transactions à la SQL, how can you guarantee atomicity between > both tables for upserts ? I mean, one write could succeed with hot table > and fail for cold table > > > > The only solution I see is using logged batch, with a huge overhead and > perf hit on for the writes > > > > On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R < > sean_r_dur...@homedepot.com> wrote: > > An idea: > > On initial insert, insert into 2 tables: > Hot with short TTL > Cold/archive with a longer (or no) TTL > Then your hot data is always in the same table, but being expired. And you > can access the archive table only for the more rare circumstances. Then you > could have the HOT table on a different volume of faster storage. If the > hot/cold tables are in different keyspaces, then you could also have > different replication (a HOT DC and an archive DC, for example) > > > Sean Durity > > > > -Original Message- > From: Mateusz > Sent: Friday, September 14, 2018 2:40 AM > To: user@cassandra.apache.org > Subject: [EXTERNAL] Re: cold vs hot data > > On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote: > > The data can grow to +100TB however the hot data will be in most cases > > less than 10TB but we still need to keep the rest of data accessible. > > Anyone has this problem? > > What is the best way to make the cluster more efficient? > > Is there a way to somehow automatically move the old data to different > > storage (rack, dc, etc)? > > Any ideas? > > We solved it using lvmcache. > > -- > Mateusz > (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś, > krótko mówiąc - podpora społeczeństwa." > Nikos Kazantzakis - "Grek Zorba" > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and li
RE: [EXTERNAL] Re: cold vs hot data
Wouldn’t you have the same problem with two similar tables with different primary keys (eg., UserByID and UserByName)? This is a very common pattern in Cassandra – inserting into multiple tables… That’s what batches are for – atomicity. I don’t understand the additional concern here. Sean Durity From: DuyHai Doan Sent: Monday, September 17, 2018 4:23 PM To: user Subject: Re: [EXTERNAL] Re: cold vs hot data Sean Without transactions à la SQL, how can you guarantee atomicity between both tables for upserts ? I mean, one write could succeed with hot table and fail for cold table The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a longer (or no) TTL Then your hot data is always in the same table, but being expired. And you can access the archive table only for the more rare circumstances. Then you could have the HOT table on a different volume of faster storage. If the hot/cold tables are in different keyspaces, then you could also have different replication (a HOT DC and an archive DC, for example) Sean Durity -Original Message- From: Mateusz mailto:mateusz-li...@ant.gliwice.pl>> Sent: Friday, September 14, 2018 2:40 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: [EXTERNAL] Re: cold vs hot data On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote: > The data can grow to +100TB however the hot data will be in most cases > less than 10TB but we still need to keep the rest of data accessible. > Anyone has this problem? > What is the best way to make the cluster more efficient? > Is there a way to somehow automatically move the old data to different > storage (rack, dc, etc)? > Any ideas? We solved it using lvmcache. -- Mateusz (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś, krótko mówiąc - podpora społeczeństwa." Nikos Kazantzakis - "Grek Zorba" - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org> For additional commands, e-mail: user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org> For additional commands, e-mail: user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: [EXTERNAL] Re: cold vs hot data
Also for the record, I remember Datastax having something called Tiered Storage that does move data around (folders/disk volume) based on data age. To be checked On Mon, Sep 17, 2018 at 10:23 PM, DuyHai Doan wrote: > Sean > > Without transactions à la SQL, how can you guarantee atomicity between > both tables for upserts ? I mean, one write could succeed with hot table > and fail for cold table > > The only solution I see is using logged batch, with a huge overhead and > perf hit on for the writes > > On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R < > sean_r_dur...@homedepot.com> wrote: > >> An idea: >> >> On initial insert, insert into 2 tables: >> Hot with short TTL >> Cold/archive with a longer (or no) TTL >> Then your hot data is always in the same table, but being expired. And >> you can access the archive table only for the more rare circumstances. Then >> you could have the HOT table on a different volume of faster storage. If >> the hot/cold tables are in different keyspaces, then you could also have >> different replication (a HOT DC and an archive DC, for example) >> >> >> Sean Durity >> >> >> -Original Message- >> From: Mateusz >> Sent: Friday, September 14, 2018 2:40 AM >> To: user@cassandra.apache.org >> Subject: [EXTERNAL] Re: cold vs hot data >> >> On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote: >> > The data can grow to +100TB however the hot data will be in most cases >> > less than 10TB but we still need to keep the rest of data accessible. >> > Anyone has this problem? >> > What is the best way to make the cluster more efficient? >> > Is there a way to somehow automatically move the old data to different >> > storage (rack, dc, etc)? >> > Any ideas? >> >> We solved it using lvmcache. >> >> -- >> Mateusz >> (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś, >> krótko mówiąc - podpora społeczeństwa." >> Nikos Kazantzakis - "Grek Zorba" >> >> >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >> >> >> The information in this Internet Email is confidential and may be legally >> privileged. It is intended solely for the addressee. Access to this Email >> by anyone else is unauthorized. If you are not the intended recipient, any >> disclosure, copying, distribution or any action taken or omitted to be >> taken in reliance on it, is prohibited and may be unlawful. When addressed >> to our clients any opinions or advice contained in this Email are subject >> to the terms and conditions expressed in any applicable governing The Home >> Depot terms of business or client engagement letter. The Home Depot >> disclaims all responsibility and liability for the accuracy and content of >> this attachment and for any damages or losses arising from any >> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >> items of a destructive nature, which may be contained in this attachment >> and shall not be liable for direct, indirect, consequential or special >> damages in connection with this e-mail message or its attachment. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > >
Re: [EXTERNAL] Re: cold vs hot data
Sean Without transactions à la SQL, how can you guarantee atomicity between both tables for upserts ? I mean, one write could succeed with hot table and fail for cold table The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R wrote: > An idea: > > On initial insert, insert into 2 tables: > Hot with short TTL > Cold/archive with a longer (or no) TTL > Then your hot data is always in the same table, but being expired. And you > can access the archive table only for the more rare circumstances. Then you > could have the HOT table on a different volume of faster storage. If the > hot/cold tables are in different keyspaces, then you could also have > different replication (a HOT DC and an archive DC, for example) > > > Sean Durity > > > -Original Message- > From: Mateusz > Sent: Friday, September 14, 2018 2:40 AM > To: user@cassandra.apache.org > Subject: [EXTERNAL] Re: cold vs hot data > > On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote: > > The data can grow to +100TB however the hot data will be in most cases > > less than 10TB but we still need to keep the rest of data accessible. > > Anyone has this problem? > > What is the best way to make the cluster more efficient? > > Is there a way to somehow automatically move the old data to different > > storage (rack, dc, etc)? > > Any ideas? > > We solved it using lvmcache. > > -- > Mateusz > (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś, > krótko mówiąc - podpora społeczeństwa." > Nikos Kazantzakis - "Grek Zorba" > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org >
RE: [EXTERNAL] Re: cold vs hot data
An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a longer (or no) TTL Then your hot data is always in the same table, but being expired. And you can access the archive table only for the more rare circumstances. Then you could have the HOT table on a different volume of faster storage. If the hot/cold tables are in different keyspaces, then you could also have different replication (a HOT DC and an archive DC, for example) Sean Durity -Original Message- From: Mateusz Sent: Friday, September 14, 2018 2:40 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: cold vs hot data On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote: > The data can grow to +100TB however the hot data will be in most cases > less than 10TB but we still need to keep the rest of data accessible. > Anyone has this problem? > What is the best way to make the cluster more efficient? > Is there a way to somehow automatically move the old data to different > storage (rack, dc, etc)? > Any ideas? We solved it using lvmcache. -- Mateusz (...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś, krótko mówiąc - podpora społeczeństwa." Nikos Kazantzakis - "Grek Zorba" - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org