Many thanks for the clarification, I will look at DSE Search in detail
because having the option of using Solr indexes with Spark jobs is a very
interesting feature to reduce the amount of data to be collected. I
understood that running Spark and Solr in the same data center was not
possible.

Best regards,


2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan <jeremiah.jor...@gmail.com>:

> Just an FYI.  DSE Search does not run in its own JVM, it runs in the same
> JVM that Cassandra is running in.  DSE Search also has integration with
> Spark map/reduce out of the box.
>
>
> On Jun 16, 2015, at 9:42 AM, Andres de la Peña <adelap...@stratio.com>
> wrote:
>
> Thanks for your interest.
>
> I am not familiar with DSE Search internals, so I can only express some
> impressions. In my opinion, both projects have similarities, but there are
> several key differences:
>
>    - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its
>    APIs and interfaces, while Stratio's Lucene index is embedded inside
>    Cassandra and tightly integrated with it. Each has its own set of pros and
>    cons.
>    - DSE Search provides several search engine features that are not yet
>    provided by Stratio's Lucene index, such as faceting, highlighting, etc. We
>    are working to bring as many of this features as we can to Apache 
> Cassandra.
>    - Stratio's Lucene index filters can be used in conjunction with
>    Cassandra's Spark/Hadoop support in order to speed up table mapping.
>    Perhaps Apache Solr has a good integration with this mapreduce frameworks,
>    I don't know if DSE provides this kind of feature out-of-the-box.
>    - Stratio's Lucene index is open source, which is always a good thing.
>
> Finally, I think that they are not mutually exclusive tools and they can
> be used together and separately depending on the scenarios.
>
> I hope it helps,
>
> 2015-06-15 18:08 GMT+02:00 Matthew Johnson <matt.john...@algomi.com>:
>
>> Hi Andres,
>>
>>
>> This looks awesome, many thanks for your work on this. Just out of
>> curiosity, how does this compare to the DSE Cassandra with embedded Solr?
>> Do they provide very similar functionality? Is there a list of obvious pros
>> and cons of one versus the other?
>>
>>
>> Thanks!
>>
>> Matthew
>>
>>
>>
>> *From:* Andres de la Peña [mailto:adelap...@stratio.com]
>> *Sent:* 13 June 2015 13:20
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Lucene index plugin for Apache Cassandra
>>
>>
>>
>> Thanks for showing interest.
>>
>>
>> Faceting is not yet supported, but it is in our roadmap. Our goal is to
>> add to Cassandra as many Lucene features as possible.
>>
>>
>> 2015-06-12 18:21 GMT+02:00 Mohammed Guller <moham...@glassbeam.com>:
>>
>> The plugin looks cool. Thank you for open sourcing it.
>>
>>
>> Does it support faceting and other Solr functionality?
>>
>>
>> Mohammed
>>
>>
>> *From:* Andres de la Peña [mailto:adelap...@stratio.com]
>> *Sent:* Friday, June 12, 2015 3:43 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Lucene index plugin for Apache Cassandra
>>
>>
>> I really appreciate your interest
>>
>>
>> Well, the first recommendation is to not use it unless you need it,
>> because a properly Cassandra denormalized model is almost always preferable
>> to indexing. Lucene indexing is a good option when there is no viable
>> denormalization alternative. This is the case of range queries over
>> multiple dimensions, full-text search or maybe complex boolean predicates.
>> It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
>> total amount of rows in a certain table, if you can pay the cost of
>> indexing.
>>
>>
>> Lucene indexes run inside C*, so users should closely monitor the amount
>> of used memory. It's also a good idea to put the Lucene directory files in
>> a separate disk to those used by C* itself. Additionally, you should
>> consider that indexed tables write throughput will be appreciably reduced,
>> maybe to a few thousands rows per second.
>>
>>
>> It's really hard to estimate the amount of resources needed by the index
>> due to the great variety of indexing and querying ways that Lucene offers,
>> so the only thing we can suggest is to empirically find the optimal setup
>> for your use case.
>>
>>
>> 2015-06-12 12:00 GMT+02:00 Carlos Rolo <r...@pythian.com>:
>>
>> Seems like an interesting tool!
>>
>> What operational recommendations would you make to users of this tool
>> (Extra hardware capacity, extra metrics to monitor, etc)?
>>
>>
>> Regards,
>>
>>
>> Carlos Juzarte Rolo
>>
>> Cassandra Consultant
>>
>>
>> Pythian - Love your data
>>
>>
>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>>
>> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
>>
>> www.pythian.com
>>
>>
>> On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña <
>> adelap...@stratio.com> wrote:
>>
>> Unfortunately, we don't have published any benchmarks yet, but we have
>> plans to do it as soon as possible. However, you can expect a similar
>> behavior as those of Elasticsearch or Solr, with some overhead due to the
>> need for indexing both the Cassandra's row key and the partition's token.
>> You can also take a look at this presentation
>> <http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/>
>> to see how cluster distribution is done.
>>
>>
>> 2015-06-12 0:45 GMT+02:00 Ben Bromhead <b...@instaclustr.com>:
>>
>> Looks awesome, do you have any examples/benchmarks of using these indexes
>> for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?
>>
>>
>> On 10 June 2015 at 09:08, Andres de la Peña <adelap...@stratio.com>
>> wrote:
>>
>> Hi all,
>>
>>
>> With the release of Cassandra 2.1.6, Stratio is glad to present its open
>> source Lucene-based implementation of C* secondary indexes
>> <https://github.com/Stratio/cassandra-lucene-index> as a plugin that can
>> be attached to Apache Cassandra. Before the above changes, Lucene index was
>> distributed inside a fork of Apache Cassandra, with all the difficulties
>> implied. As of now, the fork is discontinued and new users should use the
>> recently created plugin, which maintains all the features of Stratio
>> Cassandra <https://github.com/Stratio/stratio-cassandra>.
>>
>>
>> Stratio's Lucene index extends Cassandra’s functionality to provide near
>> real-time distributed search engine capabilities such as with ElasticSearch
>> or Solr, including full text search capabilities, free multivariable
>> search, relevance queries and field-based sorting. Each node indexes its
>> own data, so high availability and scalability is guaranteed.
>>
>>
>> We hope this will be useful to the Apache Cassandra community.
>>
>>
>> Regards,
>>
>>
>> --
>>
>>
>> Andrés de la Peña
>>
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>>
>> 28224 Pozuelo de Alarcón, Madrid
>>
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>>
>>
>>
>>
>> --
>>
>> Ben Bromhead
>>
>> Instaclustr | www.instaclustr.com | @instaclustr
>> <http://twitter.com/instaclustr> | (650) 284 9692
>>
>>
>>
>>
>>
>> --
>>
>>
>> Andrés de la Peña
>>
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>>
>> 28224 Pozuelo de Alarcón, Madrid
>>
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> Andrés de la Peña
>>
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>>
>> 28224 Pozuelo de Alarcón, Madrid
>>
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>>
>>
>>
>>
>> --
>>
>>
>> Andrés de la Peña
>>
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 3ª Planta
>>
>> 28224 Pozuelo de Alarcón, Madrid
>>
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>
>
>
> --
>
> Andrés de la Peña
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>
>
>


-- 

Andrés de la Peña


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Reply via email to