Thanks for the correction Jon. (Atmost 2000 queries *per cluster* for serving 100 searches.)
On Mon, Mar 7, 2016 at 11:47 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > If you're doing 100 searches a second each machine will be serving at most > 100 requests per second, not 2000. > > On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Well thats certainly true, there are these points worth discussing here : >> >> 1. Scatter Gather queries - Especially if the cluster size is large. Say >> we have a 20 node cluster, and we are searching 100 times a second. then >> effectively coordinator would be hitting each node 2000 times (20*100) That >> factor will only increase as the number of node goes higher. Im sure having >> a centralized index alleviates that problem. >> 2. High Cardinality (For columns like email / phone number) >> 3. Low Cardinality (Boolean column or any column with limited set of >> available options). >> >> SASI seems to be a good solution for Like queries this doc >> <https://github.com/apache/cassandra/blob/trunk/doc/SASI.md> looks >> really promising. But wouldn't it be better to tackle the use cases of >> search differently than from data storage ones, from a design standpoint? >> >> On Sun, Mar 6, 2016 at 9:14 PM, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >>> I don't have any direct personal experience with Stratio. It will all >>> depend on your queries and your data cardinality - some queries are fine >>> with secondary indexes while other are quite poor. Ditto for Lucene and >>> Solr. >>> >>> It is also worth noting that the new SASI feature of Cassandra supports >>> keyword and prefix/suffix search. But it doesn't support multi-column ad >>> hoc queries, which is what people tend to use Lucene and Solr for. So, >>> again, it all depends on your queries and your data cardinality. >>> >>> -- Jack Krupansky >>> >>> On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1ra...@gmail.com> >>> wrote: >>> >>>> Yes Jack, we are rolling out with Stratio right now, we will assess the >>>> performance benefit it yields and can go for ElasticSearch/Solr later. >>>> >>>> As per your experience how does Stratio perform vis-a-vis Secondary >>>> Indexes? >>>> >>>> On Sun, Mar 6, 2016 at 11:15 AM, Jack Krupansky < >>>> jack.krupan...@gmail.com> wrote: >>>> >>>>> You haven't been clear about how you intend to add Solr. You can also >>>>> use Stratio or Stargate for basic Lucene search if you don't want need >>>>> full >>>>> Solr support and want to stick to open source rather than go with DSE >>>>> Search for Solr. >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal <bhu1ra...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Sean and Nirmallaya. >>>>>> >>>>>> @Jack, We are going with DSC right now and plan to use spark and >>>>>> later solr over the analytics DC. The use case is to have olap and oltp >>>>>> workloads separated and not intertwine them, whether it is achieved by >>>>>> creating a new DC or a new cluster altogether. From Nirmallaya's and >>>>>> Sean's >>>>>> answer I could understand that its easily achievable by creating a >>>>>> separate >>>>>> DC, app client will need to be made DC aware and it should not make a >>>>>> coordinator in dc3. And same goes for spark configuration, it should read >>>>>> from 3rd DC. Correct me if I'm wrong. >>>>>> >>>>>> On Mar 4, 2016 7:55 PM, "Jack Krupansky" <jack.krupan...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > DataStax Enterprise (DSE) should be fine for three or even four >>>>>> data centers in the same cluster. Or are you talking about some custom >>>>>> Solr >>>>>> implementation? >>>>>> > >>>>>> > -- Jack Krupansky >>>>>> > >>>>>> > On Fri, Mar 4, 2016 at 9:21 AM, <sean_r_dur...@homedepot.com> >>>>>> wrote: >>>>>> >> >>>>>> >> Sure. Just add a new DC. Alter your keyspaces with a new >>>>>> replication factor for that DC. Run repairs on the new DC to get the data >>>>>> streamed. Then make sure your clients only connect to the DC(s) that they >>>>>> need. >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Separation of workloads is one of the key powers of a Cassandra >>>>>> cluster. >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> You may want to look at different configurations for the analytics >>>>>> cluster – smaller replication factor, more memory per node, more disk per >>>>>> node, perhaps less vnodes. Others may chime in with their experience. >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Sean Durity >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> From: Bhuvan Rawal [mailto:bhu1ra...@gmail.com] >>>>>> >> Sent: Friday, March 04, 2016 3:27 AM >>>>>> >> To: user@cassandra.apache.org >>>>>> >> Subject: How to create an additional cluster in Cassandra >>>>>> exclusively for Analytics Purpose >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Hi, >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> We would like to create an additional C* data center for batch >>>>>> processing using spark on CFS. We would like to limit this DC exclusively >>>>>> for Spark operations and would like to continue the Application Servers >>>>>> to >>>>>> continue fetching data from OLTP. >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Is there any way to configure the same? >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Regards, >>>>>> >> >>>>>> >> Bhuvan >>>>>> >> >>>>>> >> >>>>>> >> ________________________________ >>>>>> >> >>>>>> >> The information in this Internet Email is confidential and may be >>>>>> legally privileged. It is intended solely for the addressee. Access to >>>>>> this >>>>>> Email by anyone else is unauthorized. If you are not the intended >>>>>> recipient, any disclosure, copying, distribution or any action taken or >>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful. >>>>>> When addressed to our clients any opinions or advice contained in this >>>>>> Email are subject to the terms and conditions expressed in any applicable >>>>>> governing The Home Depot terms of business or client engagement letter. >>>>>> The >>>>>> Home Depot disclaims all responsibility and liability for the accuracy >>>>>> and >>>>>> content of this attachment and for any damages or losses arising from any >>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >>>>>> items of a destructive nature, which may be contained in this attachment >>>>>> and shall not be liable for direct, indirect, consequential or special >>>>>> damages in connection with this e-mail message or its attachment. >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> >>