Re: [External] Re: Cassandra ad hoc search options

vincent gromakowski Tue, 31 Jan 2017 00:58:10 -0800

You can also have a look at https://github.com/strapdata/elassandra



2017-01-31 9:50 GMT+01:00 vincent gromakowski <vincent.gromakow...@gmail.com
>:

> The problem with adhoc queries on casssandra (with spark or not) is the
> partition model of cassandra that needs to be respected to avoid full scan
> queries (the link you mentioned explains all of them). With FiloDB, which
> works on cassandra, you can pushdown predicates of the partition key and
> segment key in an arbitrary order resulting in less full scan
> queries. Another advantage is the computed columns that can also prune
> partitions or segments so reduce the reads based on a subpart of the key
> (like a timerange of 2 hours or 10 min).
> Anyway it's not magic and my personal analysis doesn't target filodb as a
> fully adhoc query solution but it's largely better than pure cassandra. You
> can easily have pushdown predicates on any combination of 1 to 3-5 columns
> depending on the dataset compared to pure cassandra where you need to
> provide a first key value to pushdown the second key predicate, then the
> third key...
>
> 2017-01-31 8:56 GMT+01:00 Yu, John <john...@sandc.com>:
>
>> Thanks. I thought you have given up Lucene for Spark, but it seems your
>> Lucene still works.
>>
>>
>>
>> Spark also has a Cassandra connector, and my questions were more towards
>> that.
>>
>> From https://github.com/datastax/spark-cassandra-connector/blob/
>> master/doc/3_selection.md, it seems there’re limitations on how much one
>> can select the data to support ad hoc queries. It seems mostly limited to
>> clustering columns. Maybe in other cases, it would result in full scan, but
>> that’s going to be very slow.
>>
>>
>>
>> Regards,
>>
>> John
>>
>>
>>
>> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
>> *Sent:* Monday, January 30, 2017 10:20 PM
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Hi,
>>
>> *Are you using the DataStax connector as well? *
>>
>> Yes, we used it to query on lucene index.
>>
>>
>>
>> *Does it support querying against any column well (not just clustering
>> columns)?*
>>
>> Yes it does. We used lucene particularly for this purpose.
>>
>> ( You can use :
>>
>> 1. https://github.com/Stratio/cassandra-lucene-index/blob/branc
>> h-3.0.10/doc/documentation.rst#searching
>>
>> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
>>
>> for more details)
>>
>>
>>
>> *I’m wondering how it could build the index around them “on-the-fly”*
>>
>> You can build indexes at run time, but it takes time(took a lot of time
>> on our cluster. Plus, CPU utilization went through the roof)
>>
>>
>>
>> *did you use Spark for the full set of data or just partial*
>>
>> We weren't allowed to install spark ( tech decision)
>>
>> Some tech discussions going around for the bulk job ecosystem.
>>
>>
>>
>> Hence as a work around, we used a faster scan utility.
>>
>> For all the adhoc purposes/scripts, you could do a full scan.
>>
>>
>>
>> I hope it helps.
>>
>>
>>
>> Regards
>>
>>
>>
>>
>>
>> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John <john...@sandc.com> wrote:
>>
>> A follow up question is: did you use Spark for the full set of data or
>> just partial? In our case, I feel we need all the data to support ad hoc
>> queries (with multiple conditional filters).
>>
>>
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> *From:* Yu, John [mailto:john...@sandc.com]
>> *Sent:* Monday, January 30, 2017 12:04 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Thanks for the input! Are you using the DataStax connector as well? Does
>> it support querying against any column well (not just clustering columns)?
>> I’m wondering how it could build the index around them “on-the-fly”.
>>
>>
>>
>> Regards,
>>
>> John
>>
>>
>>
>> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
>> <sidd.verma29.l...@gmail.com>]
>> *Sent:* Friday, January 27, 2017 12:15 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> Hi
>>
>> We used lucene stratio plugin with C*3.0.3
>>
>>
>>
>> Helped to solve a lot of some read patterns. Served well for prefix.
>>
>> But created problems as repairs failed repeatedly.
>>
>> We might have used it sub optimally, not sure.
>>
>>
>>
>> Later, we had to do away with it, and tried to serve most of the read
>> patterns with materialised views. (currently C*3.0.9)
>>
>>
>>
>> Currently, for adhoc querries, we use spark or full scan.
>>
>>
>>
>> Regards,
>>
>>
>>
>> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John <john...@sandc.com> wrote:
>>
>> Thanks a lot. Mind sharing a couple of points where you feel it’s better
>> than the alternatives.
>>
>>
>>
>> Regards,
>>
>> John
>>
>>
>>
>> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
>> *Sent:* Thursday, January 26, 2017 2:33 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [External] Re: Cassandra ad hoc search options
>>
>>
>>
>> > With Cassandra, what are the options for ad hoc query/search similar
>> to RDBMS?
>>
>>
>>
>> Your best options are Spark w/ the DataStax connector or Presto.
>> Cassandra isn't built for ad-hoc queries so you need to use other tools to
>> make it work.
>>
>>
>>
>> On Thu, Jan 26, 2017 at 2:22 PM Yu, John <john...@sandc.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> Hope I can get some help here. We’re using Cassandra for services, and
>> recently we’re adding UI support.
>>
>> With Cassandra, what are the options for ad hoc query/search similar to
>> RDBMS? We love the features of Cassandra but it seems it’s a known
>> “weakness” that it doesn’t come with strong support of indexing and ad hoc
>> queries. There’re some recent development with SASI as part of secondary
>> index. However I heard from a video where it says it shall not be
>> extensively used.
>>
>>
>>
>> Has anyone have much experience with SASI? How does it compare to Lucene
>> plugin?
>>
>> What is the direction of Apache Cassandra in the search area?
>>
>>
>>
>> We’re also looking into Solr or ElasticSearch integration, but it seems
>> it might take more efforts, and possibly involve data duplication.
>>
>> For Solr, we don’t have DSE.
>>
>> Sorry if this has been asked before, but I haven’t seen a more complete
>> answer.
>>
>>
>>
>> Thanks!
>>
>> John
>> ------------------------------
>>
>> NOTICE OF CONFIDENTIALITY:
>> This message may contain information that is considered confidential and
>> which may be prohibited from disclosure under applicable law or by
>> contractual agreement. The information is intended solely for the use of
>> the individual or entity named above. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution or use of the information contained in or attached to this
>> message is strictly prohibited. If you have received this email
>> transmission in error, please notify the sender by replying to this email
>> and then delete it from your system.
>>
>>
>>
>>
>>
>> --
>>
>> Siddharth Verma
>>
>> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
>> table scan)
>>
>>
>>
>>
>>
>> --
>>
>> Siddharth Verma
>>
>> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
>> table scan)
>>
>
>

Re: [External] Re: Cassandra ad hoc search options

Reply via email to