Many thanks for the clarification, I will look at DSE Search in detail because having the option of using Solr indexes with Spark jobs is a very interesting feature to reduce the amount of data to be collected. I understood that running Spark and Solr in the same data center was not possible.
Best regards, 2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan <jeremiah.jor...@gmail.com>: > Just an FYI. DSE Search does not run in its own JVM, it runs in the same > JVM that Cassandra is running in. DSE Search also has integration with > Spark map/reduce out of the box. > > > On Jun 16, 2015, at 9:42 AM, Andres de la Peña <adelap...@stratio.com> > wrote: > > Thanks for your interest. > > I am not familiar with DSE Search internals, so I can only express some > impressions. In my opinion, both projects have similarities, but there are > several key differences: > > - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its > APIs and interfaces, while Stratio's Lucene index is embedded inside > Cassandra and tightly integrated with it. Each has its own set of pros and > cons. > - DSE Search provides several search engine features that are not yet > provided by Stratio's Lucene index, such as faceting, highlighting, etc. We > are working to bring as many of this features as we can to Apache > Cassandra. > - Stratio's Lucene index filters can be used in conjunction with > Cassandra's Spark/Hadoop support in order to speed up table mapping. > Perhaps Apache Solr has a good integration with this mapreduce frameworks, > I don't know if DSE provides this kind of feature out-of-the-box. > - Stratio's Lucene index is open source, which is always a good thing. > > Finally, I think that they are not mutually exclusive tools and they can > be used together and separately depending on the scenarios. > > I hope it helps, > > 2015-06-15 18:08 GMT+02:00 Matthew Johnson <matt.john...@algomi.com>: > >> Hi Andres, >> >> >> This looks awesome, many thanks for your work on this. Just out of >> curiosity, how does this compare to the DSE Cassandra with embedded Solr? >> Do they provide very similar functionality? Is there a list of obvious pros >> and cons of one versus the other? >> >> >> Thanks! >> >> Matthew >> >> >> >> *From:* Andres de la Peña [mailto:adelap...@stratio.com] >> *Sent:* 13 June 2015 13:20 >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: Lucene index plugin for Apache Cassandra >> >> >> >> Thanks for showing interest. >> >> >> Faceting is not yet supported, but it is in our roadmap. Our goal is to >> add to Cassandra as many Lucene features as possible. >> >> >> 2015-06-12 18:21 GMT+02:00 Mohammed Guller <moham...@glassbeam.com>: >> >> The plugin looks cool. Thank you for open sourcing it. >> >> >> Does it support faceting and other Solr functionality? >> >> >> Mohammed >> >> >> *From:* Andres de la Peña [mailto:adelap...@stratio.com] >> *Sent:* Friday, June 12, 2015 3:43 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Lucene index plugin for Apache Cassandra >> >> >> I really appreciate your interest >> >> >> Well, the first recommendation is to not use it unless you need it, >> because a properly Cassandra denormalized model is almost always preferable >> to indexing. Lucene indexing is a good option when there is no viable >> denormalization alternative. This is the case of range queries over >> multiple dimensions, full-text search or maybe complex boolean predicates. >> It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the >> total amount of rows in a certain table, if you can pay the cost of >> indexing. >> >> >> Lucene indexes run inside C*, so users should closely monitor the amount >> of used memory. It's also a good idea to put the Lucene directory files in >> a separate disk to those used by C* itself. Additionally, you should >> consider that indexed tables write throughput will be appreciably reduced, >> maybe to a few thousands rows per second. >> >> >> It's really hard to estimate the amount of resources needed by the index >> due to the great variety of indexing and querying ways that Lucene offers, >> so the only thing we can suggest is to empirically find the optimal setup >> for your use case. >> >> >> 2015-06-12 12:00 GMT+02:00 Carlos Rolo <r...@pythian.com>: >> >> Seems like an interesting tool! >> >> What operational recommendations would you make to users of this tool >> (Extra hardware capacity, extra metrics to monitor, etc)? >> >> >> Regards, >> >> >> Carlos Juzarte Rolo >> >> Cassandra Consultant >> >> >> Pythian - Love your data >> >> >> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo >> <http://linkedin.com/in/carlosjuzarterolo>* >> >> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 >> >> www.pythian.com >> >> >> On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña < >> adelap...@stratio.com> wrote: >> >> Unfortunately, we don't have published any benchmarks yet, but we have >> plans to do it as soon as possible. However, you can expect a similar >> behavior as those of Elasticsearch or Solr, with some overhead due to the >> need for indexing both the Cassandra's row key and the partition's token. >> You can also take a look at this presentation >> <http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/> >> to see how cluster distribution is done. >> >> >> 2015-06-12 0:45 GMT+02:00 Ben Bromhead <b...@instaclustr.com>: >> >> Looks awesome, do you have any examples/benchmarks of using these indexes >> for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? >> >> >> On 10 June 2015 at 09:08, Andres de la Peña <adelap...@stratio.com> >> wrote: >> >> Hi all, >> >> >> With the release of Cassandra 2.1.6, Stratio is glad to present its open >> source Lucene-based implementation of C* secondary indexes >> <https://github.com/Stratio/cassandra-lucene-index> as a plugin that can >> be attached to Apache Cassandra. Before the above changes, Lucene index was >> distributed inside a fork of Apache Cassandra, with all the difficulties >> implied. As of now, the fork is discontinued and new users should use the >> recently created plugin, which maintains all the features of Stratio >> Cassandra <https://github.com/Stratio/stratio-cassandra>. >> >> >> Stratio's Lucene index extends Cassandra’s functionality to provide near >> real-time distributed search engine capabilities such as with ElasticSearch >> or Solr, including full text search capabilities, free multivariable >> search, relevance queries and field-based sorting. Each node indexes its >> own data, so high availability and scalability is guaranteed. >> >> >> We hope this will be useful to the Apache Cassandra community. >> >> >> Regards, >> >> >> -- >> >> >> Andrés de la Peña >> >> >> >> <http://www.stratio.com/> >> Avenida de Europa, 26. Ática 5. 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid >> >> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >> >> >> >> >> >> -- >> >> Ben Bromhead >> >> Instaclustr | www.instaclustr.com | @instaclustr >> <http://twitter.com/instaclustr> | (650) 284 9692 >> >> >> >> >> >> -- >> >> >> Andrés de la Peña >> >> >> >> <http://www.stratio.com/> >> Avenida de Europa, 26. Ática 5. 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid >> >> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >> >> >> >> -- >> >> >> >> >> >> >> -- >> >> >> Andrés de la Peña >> >> >> >> <http://www.stratio.com/> >> Avenida de Europa, 26. Ática 5. 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid >> >> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >> >> >> >> >> >> -- >> >> >> Andrés de la Peña >> >> >> >> <http://www.stratio.com/> >> Avenida de Europa, 26. Ática 5. 3ª Planta >> >> 28224 Pozuelo de Alarcón, Madrid >> >> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >> > > > > -- > > Andrés de la Peña > > > <http://www.stratio.com/> > Avenida de Europa, 26. Ática 5. 3ª Planta > 28224 Pozuelo de Alarcón, Madrid > Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* > > > -- Andrés de la Peña <http://www.stratio.com/> Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*