Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Thanks for the correction Jon. (Atmost 2000 queries *per cluster* for
serving 100 searches.)

On Mon, Mar 7, 2016 at 11:47 PM, Jonathan Haddad  wrote:

> If you're doing 100 searches a second each machine will be serving at most
> 100 requests per second, not 2000.
>
> On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal  wrote:
>
>> Well thats certainly true, there are these points worth discussing here :
>>
>> 1. Scatter Gather queries - Especially if the cluster size is large. Say
>> we have a 20 node cluster, and we are searching 100 times a second. then
>> effectively coordinator would be hitting each node 2000 times (20*100) That
>> factor will only increase as the number of node goes higher. Im sure having
>> a centralized index alleviates that problem.
>> 2. High Cardinality (For columns like email / phone number)
>> 3. Low Cardinality (Boolean column or any column with limited set of
>> available options).
>>
>> SASI seems to be a good solution for Like queries this doc
>> <https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks
>> really promising. But wouldn't it be better to tackle the use cases of
>> search differently than from data storage ones, from a design standpoint?
>>
>> On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky 
>> wrote:
>>
>>> I don't have any direct personal experience with Stratio. It will all
>>> depend on your queries and your data cardinality - some queries are fine
>>> with secondary indexes while other are quite poor. Ditto for Lucene and
>>> Solr.
>>>
>>> It is also worth noting that the new SASI feature of Cassandra supports
>>> keyword and prefix/suffix search. But it doesn't support multi-column ad
>>> hoc queries, which is what people tend to use Lucene and Solr for. So,
>>> again, it all depends on your queries and your data cardinality.
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal 
>>> wrote:
>>>
>>>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>>>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>>>
>>>> As per your experience how does Stratio perform vis-a-vis Secondary
>>>> Indexes?
>>>>
>>>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <
>>>> jack.krupan...@gmail.com> wrote:
>>>>
>>>>> You haven't been clear about how you intend to add Solr. You can also
>>>>> use Stratio or Stargate for basic Lucene search if you don't want need 
>>>>> full
>>>>> Solr support and want to stick to open source rather than go with DSE
>>>>> Search for Solr.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal 
>>>>> wrote:
>>>>>
>>>>>> Thanks Sean and Nirmallaya.
>>>>>>
>>>>>> @Jack, We are going with DSC right now and plan to use spark and
>>>>>> later solr over the analytics DC. The use case is to have  olap and oltp
>>>>>> workloads separated and not intertwine them, whether it is achieved by
>>>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and 
>>>>>> Sean's
>>>>>> answer I could understand that its easily achievable by creating a 
>>>>>> separate
>>>>>> DC, app client will need to be made DC aware and it should not make a
>>>>>> coordinator in dc3. And same goes for spark configuration, it should read
>>>>>> from 3rd DC. Correct me if I'm wrong.
>>>>>>
>>>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" 
>>>>>> wrote:
>>>>>> >
>>>>>> > DataStax Enterprise (DSE) should be fine for three or even four
>>>>>> data centers in the same cluster. Or are you talking about some custom 
>>>>>> Solr
>>>>>> implementation?
>>>>>> >
>>>>>> > -- Jack Krupansky
>>>>>> >
>>>>>> > On Fri, Mar 4, 2016 at 9:21 AM, 
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new
>>>>>> replication factor for that DC. Run repairs on the new DC to get the data
>>>>>> streamed. Then make sure

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Jonathan Haddad
If you're doing 100 searches a second each machine will be serving at most
100 requests per second, not 2000.

On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal  wrote:

> Well thats certainly true, there are these points worth discussing here :
>
> 1. Scatter Gather queries - Especially if the cluster size is large. Say
> we have a 20 node cluster, and we are searching 100 times a second. then
> effectively coordinator would be hitting each node 2000 times (20*100) That
> factor will only increase as the number of node goes higher. Im sure having
> a centralized index alleviates that problem.
> 2. High Cardinality (For columns like email / phone number)
> 3. Low Cardinality (Boolean column or any column with limited set of
> available options).
>
> SASI seems to be a good solution for Like queries this doc
> <https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks really
> promising. But wouldn't it be better to tackle the use cases of search
> differently than from data storage ones, from a design standpoint?
>
> On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky 
> wrote:
>
>> I don't have any direct personal experience with Stratio. It will all
>> depend on your queries and your data cardinality - some queries are fine
>> with secondary indexes while other are quite poor. Ditto for Lucene and
>> Solr.
>>
>> It is also worth noting that the new SASI feature of Cassandra supports
>> keyword and prefix/suffix search. But it doesn't support multi-column ad
>> hoc queries, which is what people tend to use Lucene and Solr for. So,
>> again, it all depends on your queries and your data cardinality.
>>
>> -- Jack Krupansky
>>
>> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal  wrote:
>>
>>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>>
>>> As per your experience how does Stratio perform vis-a-vis Secondary
>>> Indexes?
>>>
>>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
>>>> You haven't been clear about how you intend to add Solr. You can also
>>>> use Stratio or Stargate for basic Lucene search if you don't want need full
>>>> Solr support and want to stick to open source rather than go with DSE
>>>> Search for Solr.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal 
>>>> wrote:
>>>>
>>>>> Thanks Sean and Nirmallaya.
>>>>>
>>>>> @Jack, We are going with DSC right now and plan to use spark and later
>>>>> solr over the analytics DC. The use case is to have  olap and oltp
>>>>> workloads separated and not intertwine them, whether it is achieved by
>>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and 
>>>>> Sean's
>>>>> answer I could understand that its easily achievable by creating a 
>>>>> separate
>>>>> DC, app client will need to be made DC aware and it should not make a
>>>>> coordinator in dc3. And same goes for spark configuration, it should read
>>>>> from 3rd DC. Correct me if I'm wrong.
>>>>>
>>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" 
>>>>> wrote:
>>>>> >
>>>>> > DataStax Enterprise (DSE) should be fine for three or even four data
>>>>> centers in the same cluster. Or are you talking about some custom Solr
>>>>> implementation?
>>>>> >
>>>>> > -- Jack Krupansky
>>>>> >
>>>>> > On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
>>>>> >>
>>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new
>>>>> replication factor for that DC. Run repairs on the new DC to get the data
>>>>> streamed. Then make sure your clients only connect to the DC(s) that they
>>>>> need.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Separation of workloads is one of the key powers of a Cassandra
>>>>> cluster.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> You may want to look at different configurations for the analytics
>>>>> cluster – smaller replication factor, more memory per node, more disk

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Well thats certainly true, there are these points worth discussing here :

1. Scatter Gather queries - Especially if the cluster size is large. Say we
have a 20 node cluster, and we are searching 100 times a second. then
effectively coordinator would be hitting each node 2000 times (20*100) That
factor will only increase as the number of node goes higher. Im sure having
a centralized index alleviates that problem.
2. High Cardinality (For columns like email / phone number)
3. Low Cardinality (Boolean column or any column with limited set of
available options).

SASI seems to be a good solution for Like queries this doc
<https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks really
promising. But wouldn't it be better to tackle the use cases of search
differently than from data storage ones, from a design standpoint?

On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky 
wrote:

> I don't have any direct personal experience with Stratio. It will all
> depend on your queries and your data cardinality - some queries are fine
> with secondary indexes while other are quite poor. Ditto for Lucene and
> Solr.
>
> It is also worth noting that the new SASI feature of Cassandra supports
> keyword and prefix/suffix search. But it doesn't support multi-column ad
> hoc queries, which is what people tend to use Lucene and Solr for. So,
> again, it all depends on your queries and your data cardinality.
>
> -- Jack Krupansky
>
> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal  wrote:
>
>> Yes Jack, we are rolling out with Stratio right now, we will assess the
>> performance benefit it yields and can go for ElasticSearch/Solr later.
>>
>> As per your experience how does Stratio perform vis-a-vis Secondary
>> Indexes?
>>
>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky > > wrote:
>>
>>> You haven't been clear about how you intend to add Solr. You can also
>>> use Stratio or Stargate for basic Lucene search if you don't want need full
>>> Solr support and want to stick to open source rather than go with DSE
>>> Search for Solr.
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal 
>>> wrote:
>>>
>>>> Thanks Sean and Nirmallaya.
>>>>
>>>> @Jack, We are going with DSC right now and plan to use spark and later
>>>> solr over the analytics DC. The use case is to have  olap and oltp
>>>> workloads separated and not intertwine them, whether it is achieved by
>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
>>>> answer I could understand that its easily achievable by creating a separate
>>>> DC, app client will need to be made DC aware and it should not make a
>>>> coordinator in dc3. And same goes for spark configuration, it should read
>>>> from 3rd DC. Correct me if I'm wrong.
>>>>
>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" 
>>>> wrote:
>>>> >
>>>> > DataStax Enterprise (DSE) should be fine for three or even four data
>>>> centers in the same cluster. Or are you talking about some custom Solr
>>>> implementation?
>>>> >
>>>> > -- Jack Krupansky
>>>> >
>>>> > On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
>>>> >>
>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
>>>> factor for that DC. Run repairs on the new DC to get the data streamed.
>>>> Then make sure your clients only connect to the DC(s) that they need.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Separation of workloads is one of the key powers of a Cassandra
>>>> cluster.
>>>> >>
>>>> >>
>>>> >>
>>>> >> You may want to look at different configurations for the analytics
>>>> cluster – smaller replication factor, more memory per node, more disk per
>>>> node, perhaps less vnodes. Others may chime in with their experience.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> Sean Durity
>>>> >>
>>>> >>
>>>> >>
>>>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>>>> >> Sent: Friday, March 04, 2016 3:27 AM
>>>> >> To: user@cassandra.apache.org
>>>> >> Subject: How to create an additional cluster in Cassandra
>>>> exclusiv

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-06 Thread Jack Krupansky
I don't have any direct personal experience with Stratio. It will all
depend on your queries and your data cardinality - some queries are fine
with secondary indexes while other are quite poor. Ditto for Lucene and
Solr.

It is also worth noting that the new SASI feature of Cassandra supports
keyword and prefix/suffix search. But it doesn't support multi-column ad
hoc queries, which is what people tend to use Lucene and Solr for. So,
again, it all depends on your queries and your data cardinality.

-- Jack Krupansky

On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal  wrote:

> Yes Jack, we are rolling out with Stratio right now, we will assess the
> performance benefit it yields and can go for ElasticSearch/Solr later.
>
> As per your experience how does Stratio perform vis-a-vis Secondary
> Indexes?
>
> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky 
> wrote:
>
>> You haven't been clear about how you intend to add Solr. You can also use
>> Stratio or Stargate for basic Lucene search if you don't want need full
>> Solr support and want to stick to open source rather than go with DSE
>> Search for Solr.
>>
>> -- Jack Krupansky
>>
>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal 
>> wrote:
>>
>>> Thanks Sean and Nirmallaya.
>>>
>>> @Jack, We are going with DSC right now and plan to use spark and later
>>> solr over the analytics DC. The use case is to have  olap and oltp
>>> workloads separated and not intertwine them, whether it is achieved by
>>> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
>>> answer I could understand that its easily achievable by creating a separate
>>> DC, app client will need to be made DC aware and it should not make a
>>> coordinator in dc3. And same goes for spark configuration, it should read
>>> from 3rd DC. Correct me if I'm wrong.
>>>
>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" 
>>> wrote:
>>> >
>>> > DataStax Enterprise (DSE) should be fine for three or even four data
>>> centers in the same cluster. Or are you talking about some custom Solr
>>> implementation?
>>> >
>>> > -- Jack Krupansky
>>> >
>>> > On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
>>> >>
>>> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
>>> factor for that DC. Run repairs on the new DC to get the data streamed.
>>> Then make sure your clients only connect to the DC(s) that they need.
>>> >>
>>> >>
>>> >>
>>> >> Separation of workloads is one of the key powers of a Cassandra
>>> cluster.
>>> >>
>>> >>
>>> >>
>>> >> You may want to look at different configurations for the analytics
>>> cluster – smaller replication factor, more memory per node, more disk per
>>> node, perhaps less vnodes. Others may chime in with their experience.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Sean Durity
>>> >>
>>> >>
>>> >>
>>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>>> >> Sent: Friday, March 04, 2016 3:27 AM
>>> >> To: user@cassandra.apache.org
>>> >> Subject: How to create an additional cluster in Cassandra exclusively
>>> for Analytics Purpose
>>> >>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >>
>>> >>
>>> >> We would like to create an additional C* data center for batch
>>> processing using spark on CFS. We would like to limit this DC exclusively
>>> for Spark operations and would like to continue the Application Servers to
>>> continue fetching data from OLTP.
>>> >>
>>> >>
>>> >>
>>> >> Is there any way to configure the same?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> ​
>>> >>
>>> >> Regards,
>>> >>
>>> >> Bhuvan
>>> >>
>>> >>
>>> >> 
>>> >>
>>> >> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>> >
>>> >
>>>
>>
>>
>


Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
Yes Jack, we are rolling out with Stratio right now, we will assess the
performance benefit it yields and can go for ElasticSearch/Solr later.

As per your experience how does Stratio perform vis-a-vis Secondary Indexes?

On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky 
wrote:

> You haven't been clear about how you intend to add Solr. You can also use
> Stratio or Stargate for basic Lucene search if you don't want need full
> Solr support and want to stick to open source rather than go with DSE
> Search for Solr.
>
> -- Jack Krupansky
>
> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal  wrote:
>
>> Thanks Sean and Nirmallaya.
>>
>> @Jack, We are going with DSC right now and plan to use spark and later
>> solr over the analytics DC. The use case is to have  olap and oltp
>> workloads separated and not intertwine them, whether it is achieved by
>> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
>> answer I could understand that its easily achievable by creating a separate
>> DC, app client will need to be made DC aware and it should not make a
>> coordinator in dc3. And same goes for spark configuration, it should read
>> from 3rd DC. Correct me if I'm wrong.
>>
>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" 
>> wrote:
>> >
>> > DataStax Enterprise (DSE) should be fine for three or even four data
>> centers in the same cluster. Or are you talking about some custom Solr
>> implementation?
>> >
>> > -- Jack Krupansky
>> >
>> > On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
>> >>
>> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
>> factor for that DC. Run repairs on the new DC to get the data streamed.
>> Then make sure your clients only connect to the DC(s) that they need.
>> >>
>> >>
>> >>
>> >> Separation of workloads is one of the key powers of a Cassandra
>> cluster.
>> >>
>> >>
>> >>
>> >> You may want to look at different configurations for the analytics
>> cluster – smaller replication factor, more memory per node, more disk per
>> node, perhaps less vnodes. Others may chime in with their experience.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Sean Durity
>> >>
>> >>
>> >>
>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>> >> Sent: Friday, March 04, 2016 3:27 AM
>> >> To: user@cassandra.apache.org
>> >> Subject: How to create an additional cluster in Cassandra exclusively
>> for Analytics Purpose
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> We would like to create an additional C* data center for batch
>> processing using spark on CFS. We would like to limit this DC exclusively
>> for Spark operations and would like to continue the Application Servers to
>> continue fetching data from OLTP.
>> >>
>> >>
>> >>
>> >> Is there any way to configure the same?
>> >>
>> >>
>> >>
>> >>
>> >> ​
>> >>
>> >> Regards,
>> >>
>> >> Bhuvan
>> >>
>> >>
>> >> 
>> >>
>> >> The information in this Internet Email is confidential and may be
>> legally privileged. It is intended solely for the addressee. Access to this
>> Email by anyone else is unauthorized. If you are not the intended
>> recipient, any disclosure, copying, distribution or any action taken or
>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>> When addressed to our clients any opinions or advice contained in this
>> Email are subject to the terms and conditions expressed in any applicable
>> governing The Home Depot terms of business or client engagement letter. The
>> Home Depot disclaims all responsibility and liability for the accuracy and
>> content of this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>> >
>> >
>>
>
>


Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Jack Krupansky
You haven't been clear about how you intend to add Solr. You can also use
Stratio or Stargate for basic Lucene search if you don't want need full
Solr support and want to stick to open source rather than go with DSE
Search for Solr.

-- Jack Krupansky

On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal  wrote:

> Thanks Sean and Nirmallaya.
>
> @Jack, We are going with DSC right now and plan to use spark and later
> solr over the analytics DC. The use case is to have  olap and oltp
> workloads separated and not intertwine them, whether it is achieved by
> creating a new DC or a new cluster altogether. From Nirmallaya's and Sean's
> answer I could understand that its easily achievable by creating a separate
> DC, app client will need to be made DC aware and it should not make a
> coordinator in dc3. And same goes for spark configuration, it should read
> from 3rd DC. Correct me if I'm wrong.
>
> On Mar 4, 2016 7:55 PM, "Jack Krupansky"  wrote:
> >
> > DataStax Enterprise (DSE) should be fine for three or even four data
> centers in the same cluster. Or are you talking about some custom Solr
> implementation?
> >
> > -- Jack Krupansky
> >
> > On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
> >>
> >> Sure. Just add a new DC. Alter your keyspaces with a new replication
> factor for that DC. Run repairs on the new DC to get the data streamed.
> Then make sure your clients only connect to the DC(s) that they need.
> >>
> >>
> >>
> >> Separation of workloads is one of the key powers of a Cassandra cluster.
> >>
> >>
> >>
> >> You may want to look at different configurations for the analytics
> cluster – smaller replication factor, more memory per node, more disk per
> node, perhaps less vnodes. Others may chime in with their experience.
> >>
> >>
> >>
> >>
> >>
> >> Sean Durity
> >>
> >>
> >>
> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
> >> Sent: Friday, March 04, 2016 3:27 AM
> >> To: user@cassandra.apache.org
> >> Subject: How to create an additional cluster in Cassandra exclusively
> for Analytics Purpose
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >> We would like to create an additional C* data center for batch
> processing using spark on CFS. We would like to limit this DC exclusively
> for Spark operations and would like to continue the Application Servers to
> continue fetching data from OLTP.
> >>
> >>
> >>
> >> Is there any way to configure the same?
> >>
> >>
> >>
> >>
> >> ​
> >>
> >> Regards,
> >>
> >> Bhuvan
> >>
> >>
> >> 
> >>
> >> The information in this Internet Email is confidential and may be
> legally privileged. It is intended solely for the addressee. Access to this
> Email by anyone else is unauthorized. If you are not the intended
> recipient, any disclosure, copying, distribution or any action taken or
> omitted to be taken in reliance on it, is prohibited and may be unlawful.
> When addressed to our clients any opinions or advice contained in this
> Email are subject to the terms and conditions expressed in any applicable
> governing The Home Depot terms of business or client engagement letter. The
> Home Depot disclaims all responsibility and liability for the accuracy and
> content of this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
> >
> >
>


Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Bhuvan Rawal
Thanks Sean and Nirmallaya.

@Jack, We are going with DSC right now and plan to use spark and later solr
over the analytics DC. The use case is to have  olap and oltp workloads
separated and not intertwine them, whether it is achieved by creating a new
DC or a new cluster altogether. From Nirmallaya's and Sean's answer I could
understand that its easily achievable by creating a separate DC, app client
will need to be made DC aware and it should not make a coordinator in dc3.
And same goes for spark configuration, it should read from 3rd DC. Correct
me if I'm wrong.

On Mar 4, 2016 7:55 PM, "Jack Krupansky"  wrote:
>
> DataStax Enterprise (DSE) should be fine for three or even four data
centers in the same cluster. Or are you talking about some custom Solr
implementation?
>
> -- Jack Krupansky
>
> On Fri, Mar 4, 2016 at 9:21 AM,  wrote:
>>
>> Sure. Just add a new DC. Alter your keyspaces with a new replication
factor for that DC. Run repairs on the new DC to get the data streamed.
Then make sure your clients only connect to the DC(s) that they need.
>>
>>
>>
>> Separation of workloads is one of the key powers of a Cassandra cluster.
>>
>>
>>
>> You may want to look at different configurations for the analytics
cluster – smaller replication factor, more memory per node, more disk per
node, perhaps less vnodes. Others may chime in with their experience.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com]
>> Sent: Friday, March 04, 2016 3:27 AM
>> To: user@cassandra.apache.org
>> Subject: How to create an additional cluster in Cassandra exclusively
for Analytics Purpose
>>
>>
>>
>> Hi,
>>
>>
>>
>> We would like to create an additional C* data center for batch
processing using spark on CFS. We would like to limit this DC exclusively
for Spark operations and would like to continue the Application Servers to
continue fetching data from OLTP.
>>
>>
>>
>> Is there any way to configure the same?
>>
>>
>>
>>
>> ​
>>
>> Regards,
>>
>> Bhuvan
>>
>>
>> 
>>
>> The information in this Internet Email is confidential and may be
legally privileged. It is intended solely for the addressee. Access to this
Email by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution or any action taken or
omitted to be taken in reliance on it, is prohibited and may be unlawful.
When addressed to our clients any opinions or advice contained in this
Email are subject to the terms and conditions expressed in any applicable
governing The Home Depot terms of business or client engagement letter. The
Home Depot disclaims all responsibility and liability for the accuracy and
content of this attachment and for any damages or losses arising from any
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
items of a destructive nature, which may be contained in this attachment
and shall not be liable for direct, indirect, consequential or special
damages in connection with this e-mail message or its attachment.
>
>