You can also have a look at https://github.com/strapdata/elassandra
2017-01-31 9:50 GMT+01:00 vincent gromakowski <vincent.gromakow...@gmail.com >: > The problem with adhoc queries on casssandra (with spark or not) is the > partition model of cassandra that needs to be respected to avoid full scan > queries (the link you mentioned explains all of them). With FiloDB, which > works on cassandra, you can pushdown predicates of the partition key and > segment key in an arbitrary order resulting in less full scan > queries. Another advantage is the computed columns that can also prune > partitions or segments so reduce the reads based on a subpart of the key > (like a timerange of 2 hours or 10 min). > Anyway it's not magic and my personal analysis doesn't target filodb as a > fully adhoc query solution but it's largely better than pure cassandra. You > can easily have pushdown predicates on any combination of 1 to 3-5 columns > depending on the dataset compared to pure cassandra where you need to > provide a first key value to pushdown the second key predicate, then the > third key... > > 2017-01-31 8:56 GMT+01:00 Yu, John <john...@sandc.com>: > >> Thanks. I thought you have given up Lucene for Spark, but it seems your >> Lucene still works. >> >> >> >> Spark also has a Cassandra connector, and my questions were more towards >> that. >> >> From https://github.com/datastax/spark-cassandra-connector/blob/ >> master/doc/3_selection.md, it seems there’re limitations on how much one >> can select the data to support ad hoc queries. It seems mostly limited to >> clustering columns. Maybe in other cases, it would result in full scan, but >> that’s going to be very slow. >> >> >> >> Regards, >> >> John >> >> >> >> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com] >> *Sent:* Monday, January 30, 2017 10:20 PM >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: [External] Re: Cassandra ad hoc search options >> >> >> >> Hi, >> >> *Are you using the DataStax connector as well? * >> >> Yes, we used it to query on lucene index. >> >> >> >> *Does it support querying against any column well (not just clustering >> columns)?* >> >> Yes it does. We used lucene particularly for this purpose. >> >> ( You can use : >> >> 1. https://github.com/Stratio/cassandra-lucene-index/blob/branc >> h-3.0.10/doc/documentation.rst#searching >> >> 2. https://www.youtube.com/watch?v=Hg5s-hXy_-M >> >> for more details) >> >> >> >> *I’m wondering how it could build the index around them “on-the-fly”* >> >> You can build indexes at run time, but it takes time(took a lot of time >> on our cluster. Plus, CPU utilization went through the roof) >> >> >> >> *did you use Spark for the full set of data or just partial* >> >> We weren't allowed to install spark ( tech decision) >> >> Some tech discussions going around for the bulk job ecosystem. >> >> >> >> Hence as a work around, we used a faster scan utility. >> >> For all the adhoc purposes/scripts, you could do a full scan. >> >> >> >> I hope it helps. >> >> >> >> Regards >> >> >> >> >> >> On Tue, Jan 31, 2017 at 4:11 AM, Yu, John <john...@sandc.com> wrote: >> >> A follow up question is: did you use Spark for the full set of data or >> just partial? In our case, I feel we need all the data to support ad hoc >> queries (with multiple conditional filters). >> >> >> >> Thanks, >> >> John >> >> >> >> *From:* Yu, John [mailto:john...@sandc.com] >> *Sent:* Monday, January 30, 2017 12:04 AM >> *To:* user@cassandra.apache.org >> *Subject:* RE: [External] Re: Cassandra ad hoc search options >> >> >> >> Thanks for the input! Are you using the DataStax connector as well? Does >> it support querying against any column well (not just clustering columns)? >> I’m wondering how it could build the index around them “on-the-fly”. >> >> >> >> Regards, >> >> John >> >> >> >> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com >> <sidd.verma29.l...@gmail.com>] >> *Sent:* Friday, January 27, 2017 12:15 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: [External] Re: Cassandra ad hoc search options >> >> >> >> Hi >> >> We used lucene stratio plugin with C*3.0.3 >> >> >> >> Helped to solve a lot of some read patterns. Served well for prefix. >> >> But created problems as repairs failed repeatedly. >> >> We might have used it sub optimally, not sure. >> >> >> >> Later, we had to do away with it, and tried to serve most of the read >> patterns with materialised views. (currently C*3.0.9) >> >> >> >> Currently, for adhoc querries, we use spark or full scan. >> >> >> >> Regards, >> >> >> >> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John <john...@sandc.com> wrote: >> >> Thanks a lot. Mind sharing a couple of points where you feel it’s better >> than the alternatives. >> >> >> >> Regards, >> >> John >> >> >> >> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com] >> *Sent:* Thursday, January 26, 2017 2:33 PM >> *To:* user@cassandra.apache.org >> *Subject:* [External] Re: Cassandra ad hoc search options >> >> >> >> > With Cassandra, what are the options for ad hoc query/search similar >> to RDBMS? >> >> >> >> Your best options are Spark w/ the DataStax connector or Presto. >> Cassandra isn't built for ad-hoc queries so you need to use other tools to >> make it work. >> >> >> >> On Thu, Jan 26, 2017 at 2:22 PM Yu, John <john...@sandc.com> wrote: >> >> Hi All, >> >> >> >> Hope I can get some help here. We’re using Cassandra for services, and >> recently we’re adding UI support. >> >> With Cassandra, what are the options for ad hoc query/search similar to >> RDBMS? We love the features of Cassandra but it seems it’s a known >> “weakness” that it doesn’t come with strong support of indexing and ad hoc >> queries. There’re some recent development with SASI as part of secondary >> index. However I heard from a video where it says it shall not be >> extensively used. >> >> >> >> Has anyone have much experience with SASI? How does it compare to Lucene >> plugin? >> >> What is the direction of Apache Cassandra in the search area? >> >> >> >> We’re also looking into Solr or ElasticSearch integration, but it seems >> it might take more efforts, and possibly involve data duplication. >> >> For Solr, we don’t have DSE. >> >> Sorry if this has been asked before, but I haven’t seen a more complete >> answer. >> >> >> >> Thanks! >> >> John >> ------------------------------ >> >> NOTICE OF CONFIDENTIALITY: >> This message may contain information that is considered confidential and >> which may be prohibited from disclosure under applicable law or by >> contractual agreement. The information is intended solely for the use of >> the individual or entity named above. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution or use of the information contained in or attached to this >> message is strictly prohibited. If you have received this email >> transmission in error, please notify the sender by replying to this email >> and then delete it from your system. >> >> >> >> >> >> -- >> >> Siddharth Verma >> >> (Visit https://github.com/siddv29/cfs for a high speed cassandra full >> table scan) >> >> >> >> >> >> -- >> >> Siddharth Verma >> >> (Visit https://github.com/siddv29/cfs for a high speed cassandra full >> table scan) >> > >