Re: Lucene index plugin for Apache Cassandra
Just an FYI. DSE Search does not run in its own JVM, it runs in the same JVM that Cassandra is running in. DSE Search also has integration with Spark map/reduce out of the box. On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com wrote: Thanks for your interest. I am not familiar with DSE Search internals, so I can only express some impressions. In my opinion, both projects have similarities, but there are several key differences: DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and interfaces, while Stratio's Lucene index is embedded inside Cassandra and tightly integrated with it. Each has its own set of pros and cons. DSE Search provides several search engine features that are not yet provided by Stratio's Lucene index, such as faceting, highlighting, etc. We are working to bring as many of this features as we can to Apache Cassandra. Stratio's Lucene index filters can be used in conjunction with Cassandra's Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr has a good integration with this mapreduce frameworks, I don't know if DSE provides this kind of feature out-of-the-box. Stratio's Lucene index is open source, which is always a good thing. Finally, I think that they are not mutually exclusive tools and they can be used together and separately depending on the scenarios. I hope it helps, 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com mailto:matt.john...@algomi.com: Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew From: Andres de la Peña [mailto:adelap...@stratio.com mailto:adelap...@stratio.com] Sent: 13 June 2015 13:20 To: user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com mailto:moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed From: Andres de la Peña [mailto:adelap...@stratio.com mailto:adelap...@stratio.com] Sent: Friday, June 12, 2015 3:43 AM To: user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com mailto:r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com http://www.pythian.com/ On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com mailto:adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp
Re: Lucene index plugin for Apache Cassandra
Many thanks for the clarification, I will look at DSE Search in detail because having the option of using Solr indexes with Spark jobs is a very interesting feature to reduce the amount of data to be collected. I understood that running Spark and Solr in the same data center was not possible. Best regards, 2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan jeremiah.jor...@gmail.com: Just an FYI. DSE Search does not run in its own JVM, it runs in the same JVM that Cassandra is running in. DSE Search also has integration with Spark map/reduce out of the box. On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com wrote: Thanks for your interest. I am not familiar with DSE Search internals, so I can only express some impressions. In my opinion, both projects have similarities, but there are several key differences: - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and interfaces, while Stratio's Lucene index is embedded inside Cassandra and tightly integrated with it. Each has its own set of pros and cons. - DSE Search provides several search engine features that are not yet provided by Stratio's Lucene index, such as faceting, highlighting, etc. We are working to bring as many of this features as we can to Apache Cassandra. - Stratio's Lucene index filters can be used in conjunction with Cassandra's Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr has a good integration with this mapreduce frameworks, I don't know if DSE provides this kind of feature out-of-the-box. - Stratio's Lucene index is open source, which is always a good thing. Finally, I think that they are not mutually exclusive tools and they can be used together and separately depending on the scenarios. I hope it helps, 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com: Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* 13 June 2015 13:20 *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look
Re: Lucene index plugin for Apache Cassandra
I understood that running Spark and Solr in the same data center was not possible. It was always possible, just not supported. This changed in 4.7, see the docs: http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/ana/dseSearchAnalyticsOverview.html All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Jun 16, 2015 at 11:17 AM, Andres de la Peña adelap...@stratio.com wrote: Many thanks for the clarification, I will look at DSE Search in detail because having the option of using Solr indexes with Spark jobs is a very interesting feature to reduce the amount of data to be collected. I understood that running Spark and Solr in the same data center was not possible. Best regards, 2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan jeremiah.jor...@gmail.com: Just an FYI. DSE Search does not run in its own JVM, it runs in the same JVM that Cassandra is running in. DSE Search also has integration with Spark map/reduce out of the box. On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com wrote: Thanks for your interest. I am not familiar with DSE Search internals, so I can only express some impressions. In my opinion, both projects have similarities, but there are several key differences: - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and interfaces, while Stratio's Lucene index is embedded inside Cassandra and tightly integrated with it. Each has its own set of pros and cons. - DSE Search provides several search engine features that are not yet provided by Stratio's Lucene index, such as faceting, highlighting, etc. We are working to bring as many of this features as we can to Apache Cassandra. - Stratio's Lucene index filters can be used in conjunction with Cassandra's Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr has a good integration with this mapreduce frameworks, I don't know if DSE provides this kind of feature out-of-the-box. - Stratio's Lucene index is open source, which is always a good thing. Finally, I think that they are not mutually exclusive tools and they can be used together and separately depending on the scenarios. I hope it helps, 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com: Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* 13 June 2015 13:20 *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables
Re: Lucene index plugin for Apache Cassandra
Thanks for your interest. I am not familiar with DSE Search internals, so I can only express some impressions. In my opinion, both projects have similarities, but there are several key differences: - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and interfaces, while Stratio's Lucene index is embedded inside Cassandra and tightly integrated with it. Each has its own set of pros and cons. - DSE Search provides several search engine features that are not yet provided by Stratio's Lucene index, such as faceting, highlighting, etc. We are working to bring as many of this features as we can to Apache Cassandra. - Stratio's Lucene index filters can be used in conjunction with Cassandra's Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr has a good integration with this mapreduce frameworks, I don't know if DSE provides this kind of feature out-of-the-box. - Stratio's Lucene index is open source, which is always a good thing. Finally, I think that they are not mutually exclusive tools and they can be used together and separately depending on the scenarios. I hope it helps, 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com: Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* 13 June 2015 13:20 *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio
RE: Lucene index plugin for Apache Cassandra
Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* 13 June 2015 13:20 *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd
Re: Lucene index plugin for Apache Cassandra
Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel
Re: Lucene index plugin for Apache Cassandra
Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: Lucene index plugin for Apache Cassandra
Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- --
RE: Lucene index plugin for Apache Cassandra
The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed From: Andres de la Peña [mailto:adelap...@stratio.com] Sent: Friday, June 12, 2015 3:43 AM To: user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.commailto:r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.comhttp://www.pythian.com/ On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.commailto:adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentationhttp://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.commailto:b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.commailto:adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexeshttps://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandrahttps://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña [http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42tel:%2B34%2091%20352%2059%2042 // @stratiobdhttps://twitter.com/StratioBD -- Ben Bromhead Instaclustr | www.instaclustr.comhttps://www.instaclustr.com/ | @instaclustrhttp://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña [http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42tel:%2B34%2091%20352%2059%2042 // @stratiobdhttps://twitter.com/StratioBD -- -- Andrés de la Peña [http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // @stratiobdhttps://twitter.com/StratioBD
Re: Lucene index plugin for Apache Cassandra
I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
Re: Lucene index plugin for Apache Cassandra
Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Lucene index plugin for Apache Cassandra
Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*