Re: Cluster sizing for huge dataset

2019-10-04 Thread DuyHai Doan
The problem is that the user wants to access old data also using cql, not
popping un a Sparksql just to fetch one or two old records

Le 4 oct. 2019 12:38, "Cedrick Lunven"  a
écrit :

> Hi,
>
> If you are using DataStax Enterprise why not offloading cold data to DSEFS
> (HDFS implementation) with friendly analytics storage format like parquet,
> keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go
> up to 30TB a node.
>
> I am pretty sure you are already aware of this option and would be curious
> to get your think about this solution and limitations.
>
> Note: that would also probably help you with your init-load/TWCS issue .
>
> My2c.
> Cedrick
>
> On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan  wrote:
>
>> The client wants to be able to access cold data (2 years old) in the
>> same cluster so moving data to another system is not possible
>>
>> However, since we're using Datastax Enterprise, we can leverage Tiered
>> Storage and store old data on Spinning Disks to save on hardware
>>
>> Regards
>>
>> On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
>>  wrote:
>> >
>> > Hi,
>> > Depending on the use case, you may also consider storage tiering with
>> fresh data on hot-tier (Cassandra) and older data on cold-tier
>> (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may
>> fit more appropriately the budget and you may reuse some tech already
>> present in your environment.
>> > You may even do subsampling during the transformation offloading data
>> from Cassandra in order to keep one point out of 10 for older data if
>> subsampling makes sense for your data signal.
>> >
>> > Regards
>> > Julien
>> >
>> > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan  a
>> écrit :
>> >>
>> >> Thanks all for your reply
>> >>
>> >> The target deployment is on Azure so with the Nice disk snapshot
>> feature, replacing a dead node is easier, no streaming from Cassandra
>> >>
>> >> About compaction overhead, using TwCs with a 1 day bucket and removing
>> read repair and subrange repair should be sufficient
>> >>
>> >> Now the only remaining issue is Quorum read which triggers repair
>> automagically
>> >>
>> >> Before 4.0  there is no flag to turn it off unfortunately
>> >>
>> >> Le 30 sept. 2019 15:47, "Eric Evans"  a
>> écrit :
>> >>
>> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa  wrote:
>> >>
>> >> [ ... ]
>> >>
>> >> > 2) The 2TB guidance is old and irrelevant for most people, what you
>> really care about is how fast you can replace the failed machine
>> >> >
>> >> > You’d likely be ok going significantly larger than that if you use a
>> few vnodes, since that’ll help rebuild faster (you’ll stream from more
>> sources on rebuild)
>> >> >
>> >> > If you don’t want to use vnodes, buy big machines and run multiple
>> Cassandra instances in it - it’s not hard to run 3-4TB per instance and
>> 12-16T of SSD per machine
>> >>
>> >> We do this too.  It's worth keeping in mind though that you'll still
>> >> have a 12-16T blast radius in the event of a host failure.  As the
>> >> host density goes up, consider steps to make the host more robust
>> >> (RAID, redundant power supplies, etc).
>> >>
>> >> --
>> >> Eric Evans
>> >> john.eric.ev...@gmail.com
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
> --
>
>
> Cedrick Lunven | EMEA Developer Advocate Manager
>
>
>  
>  
>
>
> ❓Ask us your questions : *DataStax Community
> *
>
> Test our new products : *DataStax Labs
> *
>
>
>
> 
>
>
>


Cassandra Lan Party How-to

2019-10-04 Thread Jérémy SEVELLEC
Hi Cassandra folks,

I've run a bunch of Cassandra Lan Party and i wanted to share my experience
of running them.
I hope it inspires new organizer to run it as well because it's quite fun
to do!

I've created a How-to :
https://www.unchticafe.fr/2019/10/cassandra-lan-party-how-to.html

The How-to comes with a Cassandra Lan Party Configurer which ease the setup
for attendees.
https://github.com/jsevellec/cassandra-lan-party-configurer

The app is OSS and Apache Licensed. Feel free to use it!

I hope it helps!

Regards,

-- 
Jérémy


Re: Cluster sizing for huge dataset

2019-10-04 Thread Cedrick Lunven
Hi,

If you are using DataStax Enterprise why not offloading cold data to DSEFS
(HDFS implementation) with friendly analytics storage format like parquet,
keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go
up to 30TB a node.

I am pretty sure you are already aware of this option and would be curious
to get your think about this solution and limitations.

Note: that would also probably help you with your init-load/TWCS issue .

My2c.
Cedrick

On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan  wrote:

> The client wants to be able to access cold data (2 years old) in the
> same cluster so moving data to another system is not possible
>
> However, since we're using Datastax Enterprise, we can leverage Tiered
> Storage and store old data on Spinning Disks to save on hardware
>
> Regards
>
> On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
>  wrote:
> >
> > Hi,
> > Depending on the use case, you may also consider storage tiering with
> fresh data on hot-tier (Cassandra) and older data on cold-tier
> (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may
> fit more appropriately the budget and you may reuse some tech already
> present in your environment.
> > You may even do subsampling during the transformation offloading data
> from Cassandra in order to keep one point out of 10 for older data if
> subsampling makes sense for your data signal.
> >
> > Regards
> > Julien
> >
> > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan  a
> écrit :
> >>
> >> Thanks all for your reply
> >>
> >> The target deployment is on Azure so with the Nice disk snapshot
> feature, replacing a dead node is easier, no streaming from Cassandra
> >>
> >> About compaction overhead, using TwCs with a 1 day bucket and removing
> read repair and subrange repair should be sufficient
> >>
> >> Now the only remaining issue is Quorum read which triggers repair
> automagically
> >>
> >> Before 4.0  there is no flag to turn it off unfortunately
> >>
> >> Le 30 sept. 2019 15:47, "Eric Evans"  a
> écrit :
> >>
> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa  wrote:
> >>
> >> [ ... ]
> >>
> >> > 2) The 2TB guidance is old and irrelevant for most people, what you
> really care about is how fast you can replace the failed machine
> >> >
> >> > You’d likely be ok going significantly larger than that if you use a
> few vnodes, since that’ll help rebuild faster (you’ll stream from more
> sources on rebuild)
> >> >
> >> > If you don’t want to use vnodes, buy big machines and run multiple
> Cassandra instances in it - it’s not hard to run 3-4TB per instance and
> 12-16T of SSD per machine
> >>
> >> We do this too.  It's worth keeping in mind though that you'll still
> >> have a 12-16T blast radius in the event of a host failure.  As the
> >> host density goes up, consider steps to make the host more robust
> >> (RAID, redundant power supplies, etc).
> >>
> >> --
> >> Eric Evans
> >> john.eric.ev...@gmail.com
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 


Cedrick Lunven | EMEA Developer Advocate Manager


 
 


❓Ask us your questions : *DataStax Community
*

Test our new products : *DataStax Labs
*