Re: Lucene index plugin for Apache Cassandra

2015-06-16 Thread Jeremiah D Jordan
Just an FYI.  DSE Search does not run in its own JVM, it runs in the same JVM 
that Cassandra is running in.  DSE Search also has integration with Spark 
map/reduce out of the box.


 On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com wrote:
 
 Thanks for your interest. 
 
 I am not familiar with DSE Search internals, so I can only express some 
 impressions. In my opinion, both projects have similarities, but there are 
 several key differences:
 DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and 
 interfaces, while Stratio's Lucene index is embedded inside Cassandra and 
 tightly integrated with it. Each has its own set of pros and cons.
 DSE Search provides several search engine features that are not yet provided 
 by Stratio's Lucene index, such as faceting, highlighting, etc. We are 
 working to bring as many of this features as we can to Apache Cassandra.
 Stratio's Lucene index filters can be used in conjunction with Cassandra's 
 Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr 
 has a good integration with this mapreduce frameworks, I don't know if DSE 
 provides this kind of feature out-of-the-box.
 Stratio's Lucene index is open source, which is always a good thing.
 Finally, I think that they are not mutually exclusive tools and they can be 
 used together and separately depending on the scenarios.
 
 I hope it helps,
 
 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com 
 mailto:matt.john...@algomi.com:
 Hi Andres,
 
  
 This looks awesome, many thanks for your work on this. Just out of curiosity, 
 how does this compare to the DSE Cassandra with embedded Solr? Do they 
 provide very similar functionality? Is there a list of obvious pros and cons 
 of one versus the other?
 
  
 Thanks!
 
 Matthew
 
  
  
 From: Andres de la Peña [mailto:adelap...@stratio.com 
 mailto:adelap...@stratio.com] 
 Sent: 13 June 2015 13:20
 
 
 To: user@cassandra.apache.org mailto:user@cassandra.apache.org
 Subject: Re: Lucene index plugin for Apache Cassandra
 
  
 Thanks for showing interest. 
 
  
 Faceting is not yet supported, but it is in our roadmap. Our goal is to add 
 to Cassandra as many Lucene features as possible.
 
  
 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com 
 mailto:moham...@glassbeam.com:
 
 The plugin looks cool. Thank you for open sourcing it.
 
  
 Does it support faceting and other Solr functionality?
 
  
 Mohammed
 
  
 From: Andres de la Peña [mailto:adelap...@stratio.com 
 mailto:adelap...@stratio.com] 
 Sent: Friday, June 12, 2015 3:43 AM
 To: user@cassandra.apache.org mailto:user@cassandra.apache.org
 Subject: Re: Lucene index plugin for Apache Cassandra
 
  
 I really appreciate your interest
 
  
 Well, the first recommendation is to not use it unless you need it, because a 
 properly Cassandra denormalized model is almost always preferable to 
 indexing. Lucene indexing is a good option when there is no viable 
 denormalization alternative. This is the case of range queries over multiple 
 dimensions, full-text search or maybe complex boolean predicates. It's also 
 appropriate for Spark/Hadoop jobs mapping a small fraction of the total 
 amount of rows in a certain table, if you can pay the cost of indexing.
 
  
 Lucene indexes run inside C*, so users should closely monitor the amount of 
 used memory. It's also a good idea to put the Lucene directory files in a 
 separate disk to those used by C* itself. Additionally, you should consider 
 that indexed tables write throughput will be appreciably reduced, maybe to a 
 few thousands rows per second.
 
  
 It's really hard to estimate the amount of resources needed by the index due 
 to the great variety of indexing and querying ways that Lucene offers, so the 
 only thing we can suggest is to empirically find the optimal setup for your 
 use case.
 
  
 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com 
 mailto:r...@pythian.com:
 
 Seems like an interesting tool!
 
 What operational recommendations would you make to users of this tool (Extra 
 hardware capacity, extra metrics to monitor, etc)?
 
 
 
 Regards,
 
  
 Carlos Juzarte Rolo
 
 Cassandra Consultant
 
  
 Pythian - Love your data
 
  
 rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo 
 http://linkedin.com/in/carlosjuzarterolo
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 
 www.pythian.com http://www.pythian.com/
  
 On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com 
 mailto:adelap...@stratio.com wrote:
 
 Unfortunately, we don't have published any benchmarks yet, but we have plans 
 to do it as soon as possible. However, you can expect a similar behavior as 
 those of Elasticsearch or Solr, with some overhead due to the need for 
 indexing both the Cassandra's row key and the partition's token. You can also 
 take a look at this presentation 
 http://planetcassandra.org/video-presentations/vp

Re: Lucene index plugin for Apache Cassandra

2015-06-16 Thread Andres de la Peña
Many thanks for the clarification, I will look at DSE Search in detail
because having the option of using Solr indexes with Spark jobs is a very
interesting feature to reduce the amount of data to be collected. I
understood that running Spark and Solr in the same data center was not
possible.

Best regards,


2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan jeremiah.jor...@gmail.com:

 Just an FYI.  DSE Search does not run in its own JVM, it runs in the same
 JVM that Cassandra is running in.  DSE Search also has integration with
 Spark map/reduce out of the box.


 On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Thanks for your interest.

 I am not familiar with DSE Search internals, so I can only express some
 impressions. In my opinion, both projects have similarities, but there are
 several key differences:

- DSE Solr, if I'm not wrong, runs in a separate JVM preserving its
APIs and interfaces, while Stratio's Lucene index is embedded inside
Cassandra and tightly integrated with it. Each has its own set of pros and
cons.
- DSE Search provides several search engine features that are not yet
provided by Stratio's Lucene index, such as faceting, highlighting, etc. We
are working to bring as many of this features as we can to Apache 
 Cassandra.
- Stratio's Lucene index filters can be used in conjunction with
Cassandra's Spark/Hadoop support in order to speed up table mapping.
Perhaps Apache Solr has a good integration with this mapreduce frameworks,
I don't know if DSE provides this kind of feature out-of-the-box.
- Stratio's Lucene index is open source, which is always a good thing.

 Finally, I think that they are not mutually exclusive tools and they can
 be used together and separately depending on the scenarios.

 I hope it helps,

 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com:

 Hi Andres,


 This looks awesome, many thanks for your work on this. Just out of
 curiosity, how does this compare to the DSE Cassandra with embedded Solr?
 Do they provide very similar functionality? Is there a list of obvious pros
 and cons of one versus the other?


 Thanks!

 Matthew



 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* 13 June 2015 13:20

 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra



 Thanks for showing interest.


 Faceting is not yet supported, but it is in our roadmap. Our goal is to
 add to Cassandra as many Lucene features as possible.


 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com:

 The plugin looks cool. Thank you for open sourcing it.


 Does it support faceting and other Solr functionality?


 Mohammed


 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* Friday, June 12, 2015 3:43 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra


 I really appreciate your interest


 Well, the first recommendation is to not use it unless you need it,
 because a properly Cassandra denormalized model is almost always preferable
 to indexing. Lucene indexing is a good option when there is no viable
 denormalization alternative. This is the case of range queries over
 multiple dimensions, full-text search or maybe complex boolean predicates.
 It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
 total amount of rows in a certain table, if you can pay the cost of
 indexing.


 Lucene indexes run inside C*, so users should closely monitor the amount
 of used memory. It's also a good idea to put the Lucene directory files in
 a separate disk to those used by C* itself. Additionally, you should
 consider that indexed tables write throughput will be appreciably reduced,
 maybe to a few thousands rows per second.


 It's really hard to estimate the amount of resources needed by the index
 due to the great variety of indexing and querying ways that Lucene offers,
 so the only thing we can suggest is to empirically find the optimal setup
 for your use case.


 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com:

 Seems like an interesting tool!

 What operational recommendations would you make to users of this tool
 (Extra hardware capacity, extra metrics to monitor, etc)?


 Regards,


 Carlos Juzarte Rolo

 Cassandra Consultant


 Pythian - Love your data


 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*

 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

 www.pythian.com


 On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña 
 adelap...@stratio.com wrote:

 Unfortunately, we don't have published any benchmarks yet, but we have
 plans to do it as soon as possible. However, you can expect a similar
 behavior as those of Elasticsearch or Solr, with some overhead due to the
 need for indexing both the Cassandra's row key and the partition's token.
 You can also take a look

Re: Lucene index plugin for Apache Cassandra

2015-06-16 Thread Sebastian Estevez

 I understood that running Spark and Solr in the same data center was not
 possible.


It was always possible, just not supported. This changed in 4.7, see the
docs:

http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/ana/dseSearchAnalyticsOverview.html

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Jun 16, 2015 at 11:17 AM, Andres de la Peña adelap...@stratio.com
wrote:

 Many thanks for the clarification, I will look at DSE Search in detail
 because having the option of using Solr indexes with Spark jobs is a very
 interesting feature to reduce the amount of data to be collected. I
 understood that running Spark and Solr in the same data center was not
 possible.

 Best regards,


 2015-06-16 16:53 GMT+02:00 Jeremiah D Jordan jeremiah.jor...@gmail.com:

 Just an FYI.  DSE Search does not run in its own JVM, it runs in the same
 JVM that Cassandra is running in.  DSE Search also has integration with
 Spark map/reduce out of the box.


 On Jun 16, 2015, at 9:42 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Thanks for your interest.

 I am not familiar with DSE Search internals, so I can only express some
 impressions. In my opinion, both projects have similarities, but there are
 several key differences:

- DSE Solr, if I'm not wrong, runs in a separate JVM preserving its
APIs and interfaces, while Stratio's Lucene index is embedded inside
Cassandra and tightly integrated with it. Each has its own set of pros and
cons.
- DSE Search provides several search engine features that are not yet
provided by Stratio's Lucene index, such as faceting, highlighting, etc. 
 We
are working to bring as many of this features as we can to Apache 
 Cassandra.
- Stratio's Lucene index filters can be used in conjunction with
Cassandra's Spark/Hadoop support in order to speed up table mapping.
Perhaps Apache Solr has a good integration with this mapreduce frameworks,
I don't know if DSE provides this kind of feature out-of-the-box.
- Stratio's Lucene index is open source, which is always a good thing.

 Finally, I think that they are not mutually exclusive tools and they can
 be used together and separately depending on the scenarios.

 I hope it helps,

 2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com:

 Hi Andres,


 This looks awesome, many thanks for your work on this. Just out of
 curiosity, how does this compare to the DSE Cassandra with embedded Solr?
 Do they provide very similar functionality? Is there a list of obvious pros
 and cons of one versus the other?


 Thanks!

 Matthew



 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* 13 June 2015 13:20

 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra



 Thanks for showing interest.


 Faceting is not yet supported, but it is in our roadmap. Our goal is to
 add to Cassandra as many Lucene features as possible.


 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com:

 The plugin looks cool. Thank you for open sourcing it.


 Does it support faceting and other Solr functionality?


 Mohammed


 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* Friday, June 12, 2015 3:43 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra


 I really appreciate your interest


 Well, the first recommendation is to not use it unless you need it,
 because a properly Cassandra denormalized model is almost always preferable
 to indexing. Lucene indexing is a good option when there is no viable
 denormalization alternative. This is the case of range queries over
 multiple dimensions, full-text search or maybe complex boolean predicates.
 It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
 total amount of rows in a certain table, if you can pay the cost of
 indexing.


 Lucene indexes run inside C*, so users should closely monitor the amount
 of used memory. It's also a good idea to put the Lucene directory files in
 a separate disk to those used by C* itself. Additionally, you should
 consider that indexed tables

Re: Lucene index plugin for Apache Cassandra

2015-06-16 Thread Andres de la Peña
Thanks for your interest.

I am not familiar with DSE Search internals, so I can only express some
impressions. In my opinion, both projects have similarities, but there are
several key differences:

   - DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs
   and interfaces, while Stratio's Lucene index is embedded inside Cassandra
   and tightly integrated with it. Each has its own set of pros and cons.
   - DSE Search provides several search engine features that are not yet
   provided by Stratio's Lucene index, such as faceting, highlighting, etc. We
   are working to bring as many of this features as we can to Apache Cassandra.
   - Stratio's Lucene index filters can be used in conjunction with
   Cassandra's Spark/Hadoop support in order to speed up table mapping.
   Perhaps Apache Solr has a good integration with this mapreduce frameworks,
   I don't know if DSE provides this kind of feature out-of-the-box.
   - Stratio's Lucene index is open source, which is always a good thing.

Finally, I think that they are not mutually exclusive tools and they can be
used together and separately depending on the scenarios.

I hope it helps,

2015-06-15 18:08 GMT+02:00 Matthew Johnson matt.john...@algomi.com:

 Hi Andres,



 This looks awesome, many thanks for your work on this. Just out of
 curiosity, how does this compare to the DSE Cassandra with embedded Solr?
 Do they provide very similar functionality? Is there a list of obvious pros
 and cons of one versus the other?



 Thanks!

 Matthew





 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* 13 June 2015 13:20

 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra



 Thanks for showing interest.



 Faceting is not yet supported, but it is in our roadmap. Our goal is to
 add to Cassandra as many Lucene features as possible.



 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com:

 The plugin looks cool. Thank you for open sourcing it.



 Does it support faceting and other Solr functionality?



 Mohammed



 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* Friday, June 12, 2015 3:43 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra



 I really appreciate your interest



 Well, the first recommendation is to not use it unless you need it,
 because a properly Cassandra denormalized model is almost always preferable
 to indexing. Lucene indexing is a good option when there is no viable
 denormalization alternative. This is the case of range queries over
 multiple dimensions, full-text search or maybe complex boolean predicates.
 It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
 total amount of rows in a certain table, if you can pay the cost of
 indexing.



 Lucene indexes run inside C*, so users should closely monitor the amount
 of used memory. It's also a good idea to put the Lucene directory files in
 a separate disk to those used by C* itself. Additionally, you should
 consider that indexed tables write throughput will be appreciably reduced,
 maybe to a few thousands rows per second.



 It's really hard to estimate the amount of resources needed by the index
 due to the great variety of indexing and querying ways that Lucene offers,
 so the only thing we can suggest is to empirically find the optimal setup
 for your use case.



 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com:

 Seems like an interesting tool!

 What operational recommendations would you make to users of this tool
 (Extra hardware capacity, extra metrics to monitor, etc)?


 Regards,



 Carlos Juzarte Rolo

 Cassandra Consultant



 Pythian - Love your data



 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*

 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

 www.pythian.com



 On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Unfortunately, we don't have published any benchmarks yet, but we have
 plans to do it as soon as possible. However, you can expect a similar
 behavior as those of Elasticsearch or Solr, with some overhead due to the
 need for indexing both the Cassandra's row key and the partition's token.
 You can also take a look at this presentation
 http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
 to see how cluster distribution is done.



 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

 Looks awesome, do you have any examples/benchmarks of using these indexes
 for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?



 On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

 Hi all,



 With the release of Cassandra 2.1.6, Stratio is glad to present its open
 source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio

RE: Lucene index plugin for Apache Cassandra

2015-06-15 Thread Matthew Johnson
Hi Andres,



This looks awesome, many thanks for your work on this. Just out of
curiosity, how does this compare to the DSE Cassandra with embedded Solr?
Do they provide very similar functionality? Is there a list of obvious pros
and cons of one versus the other?



Thanks!

Matthew





*From:* Andres de la Peña [mailto:adelap...@stratio.com]
*Sent:* 13 June 2015 13:20
*To:* user@cassandra.apache.org
*Subject:* Re: Lucene index plugin for Apache Cassandra



Thanks for showing interest.



Faceting is not yet supported, but it is in our roadmap. Our goal is to add
to Cassandra as many Lucene features as possible.



2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com:

The plugin looks cool. Thank you for open sourcing it.



Does it support faceting and other Solr functionality?



Mohammed



*From:* Andres de la Peña [mailto:adelap...@stratio.com]
*Sent:* Friday, June 12, 2015 3:43 AM
*To:* user@cassandra.apache.org
*Subject:* Re: Lucene index plugin for Apache Cassandra



I really appreciate your interest



Well, the first recommendation is to not use it unless you need it, because
a properly Cassandra denormalized model is almost always preferable to
indexing. Lucene indexing is a good option when there is no viable
denormalization alternative. This is the case of range queries over
multiple dimensions, full-text search or maybe complex boolean predicates.
It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
total amount of rows in a certain table, if you can pay the cost of
indexing.



Lucene indexes run inside C*, so users should closely monitor the amount of
used memory. It's also a good idea to put the Lucene directory files in a
separate disk to those used by C* itself. Additionally, you should consider
that indexed tables write throughput will be appreciably reduced, maybe to
a few thousands rows per second.



It's really hard to estimate the amount of resources needed by the index
due to the great variety of indexing and querying ways that Lucene offers,
so the only thing we can suggest is to empirically find the optimal setup
for your use case.



2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com:

Seems like an interesting tool!

What operational recommendations would you make to users of this tool
(Extra hardware capacity, extra metrics to monitor, etc)?


Regards,



Carlos Juzarte Rolo

Cassandra Consultant



Pythian - Love your data



rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*

Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

www.pythian.com



On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com
wrote:

Unfortunately, we don't have published any benchmarks yet, but we have
plans to do it as soon as possible. However, you can expect a similar
behavior as those of Elasticsearch or Solr, with some overhead due to the
need for indexing both the Cassandra's row key and the partition's token.
You can also take a look at this presentation
http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
to see how cluster distribution is done.



2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

Looks awesome, do you have any examples/benchmarks of using these indexes
for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?



On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

Hi all,



With the release of Cassandra 2.1.6, Stratio is glad to present its open
source Lucene-based implementation of C* secondary indexes
https://github.com/Stratio/cassandra-lucene-index as a plugin that can be
attached to Apache Cassandra. Before the above changes, Lucene index was
distributed inside a fork of Apache Cassandra, with all the difficulties
implied. As of now, the fork is discontinued and new users should use the
recently created plugin, which maintains all the features of Stratio
Cassandra https://github.com/Stratio/stratio-cassandra.



Stratio's Lucene index extends Cassandra’s functionality to provide near
real-time distributed search engine capabilities such as with ElasticSearch
or Solr, including full text search capabilities, free multivariable
search, relevance queries and field-based sorting. Each node indexes its
own data, so high availability and scalability is guaranteed.



We hope this will be useful to the Apache Cassandra community.



Regards,



-- 


Andrés de la Peña



http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692





-- 


Andrés de la Peña



http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd

Re: Lucene index plugin for Apache Cassandra

2015-06-13 Thread Andres de la Peña
Thanks for showing interest.

Faceting is not yet supported, but it is in our roadmap. Our goal is to add
to Cassandra as many Lucene features as possible.

2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com:

  The plugin looks cool. Thank you for open sourcing it.



 Does it support faceting and other Solr functionality?



 Mohammed



 *From:* Andres de la Peña [mailto:adelap...@stratio.com]
 *Sent:* Friday, June 12, 2015 3:43 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Lucene index plugin for Apache Cassandra



 I really appreciate your interest



 Well, the first recommendation is to not use it unless you need it,
 because a properly Cassandra denormalized model is almost always preferable
 to indexing. Lucene indexing is a good option when there is no viable
 denormalization alternative. This is the case of range queries over
 multiple dimensions, full-text search or maybe complex boolean predicates.
 It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
 total amount of rows in a certain table, if you can pay the cost of
 indexing.



 Lucene indexes run inside C*, so users should closely monitor the amount
 of used memory. It's also a good idea to put the Lucene directory files in
 a separate disk to those used by C* itself. Additionally, you should
 consider that indexed tables write throughput will be appreciably reduced,
 maybe to a few thousands rows per second.



 It's really hard to estimate the amount of resources needed by the index
 due to the great variety of indexing and querying ways that Lucene offers,
 so the only thing we can suggest is to empirically find the optimal setup
 for your use case.



 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com:

 Seems like an interesting tool!

 What operational recommendations would you make to users of this tool
 (Extra hardware capacity, extra metrics to monitor, etc)?


 Regards,



 Carlos Juzarte Rolo

 Cassandra Consultant



 Pythian - Love your data



 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*

 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

 www.pythian.com



 On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com
 wrote:

 Unfortunately, we don't have published any benchmarks yet, but we have
 plans to do it as soon as possible. However, you can expect a similar
 behavior as those of Elasticsearch or Solr, with some overhead due to the
 need for indexing both the Cassandra's row key and the partition's token.
 You can also take a look at this presentation
 http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
 to see how cluster distribution is done.



 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

 Looks awesome, do you have any examples/benchmarks of using these indexes
 for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?



 On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

 Hi all,



 With the release of Cassandra 2.1.6, Stratio is glad to present its open
 source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that can
 be attached to Apache Cassandra. Before the above changes, Lucene index was
 distributed inside a fork of Apache Cassandra, with all the difficulties
 implied. As of now, the fork is discontinued and new users should use the
 recently created plugin, which maintains all the features of Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide near
 real-time distributed search engine capabilities such as with ElasticSearch
 or Solr, including full text search capabilities, free multivariable
 search, relevance queries and field-based sorting. Each node indexes its
 own data, so high availability and scalability is guaranteed.



 We hope this will be useful to the Apache Cassandra community.



 Regards,



 --


   Andrés de la Peña



 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta

 28224 Pozuelo de Alarcón, Madrid

 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*





 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692





 --


   Andrés de la Peña



 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta

 28224 Pozuelo de Alarcón, Madrid

 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*





 --







 --


   Andrés de la Peña



 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta

 28224 Pozuelo de Alarcón, Madrid

 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel

Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Andres de la Peña
Unfortunately, we don't have published any benchmarks yet, but we have
plans to do it as soon as possible. However, you can expect a similar
behavior as those of Elasticsearch or Solr, with some overhead due to the
need for indexing both the Cassandra's row key and the partition's token.
You can also take a look at this presentation
http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
to see how cluster distribution is done.

2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

 Looks awesome, do you have any examples/benchmarks of using these indexes
 for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

 On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

 Hi all,

 With the release of Cassandra 2.1.6, Stratio is glad to present its open
 source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that can
 be attached to Apache Cassandra. Before the above changes, Lucene index was
 distributed inside a fork of Apache Cassandra, with all the difficulties
 implied. As of now, the fork is discontinued and new users should use the
 recently created plugin, which maintains all the features of Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide near
 real-time distributed search engine capabilities such as with ElasticSearch
 or Solr, including full text search capabilities, free multivariable
 search, relevance queries and field-based sorting. Each node indexes its
 own data, so high availability and scalability is guaranteed.


 We hope this will be useful to the Apache Cassandra community.


 Regards,

 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692




-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*


Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Carlos Rolo
Seems like an interesting tool!

What operational recommendations would you make to users of this tool
(Extra hardware capacity, extra metrics to monitor, etc)?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com
wrote:

 Unfortunately, we don't have published any benchmarks yet, but we have
 plans to do it as soon as possible. However, you can expect a similar
 behavior as those of Elasticsearch or Solr, with some overhead due to the
 need for indexing both the Cassandra's row key and the partition's token.
 You can also take a look at this presentation
 http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
 to see how cluster distribution is done.

 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

 Looks awesome, do you have any examples/benchmarks of using these indexes
 for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

 On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com
 wrote:

 Hi all,

 With the release of Cassandra 2.1.6, Stratio is glad to present its
 open source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that
 can be attached to Apache Cassandra. Before the above changes, Lucene index
 was distributed inside a fork of Apache Cassandra, with all the
 difficulties implied. As of now, the fork is discontinued and new users
 should use the recently created plugin, which maintains all the features of 
 Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide near
 real-time distributed search engine capabilities such as with ElasticSearch
 or Solr, including full text search capabilities, free multivariable
 search, relevance queries and field-based sorting. Each node indexes its
 own data, so high availability and scalability is guaranteed.


 We hope this will be useful to the Apache Cassandra community.


 Regards,

 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692




 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*


-- 


--





RE: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Mohammed Guller
The plugin looks cool. Thank you for open sourcing it.

Does it support faceting and other Solr functionality?

Mohammed

From: Andres de la Peña [mailto:adelap...@stratio.com]
Sent: Friday, June 12, 2015 3:43 AM
To: user@cassandra.apache.org
Subject: Re: Lucene index plugin for Apache Cassandra

I really appreciate your interest

Well, the first recommendation is to not use it unless you need it, because a 
properly Cassandra denormalized model is almost always preferable to indexing. 
Lucene indexing is a good option when there is no viable denormalization 
alternative. This is the case of range queries over multiple dimensions, 
full-text search or maybe complex boolean predicates. It's also appropriate for 
Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a 
certain table, if you can pay the cost of indexing.

Lucene indexes run inside C*, so users should closely monitor the amount of 
used memory. It's also a good idea to put the Lucene directory files in a 
separate disk to those used by C* itself. Additionally, you should consider 
that indexed tables write throughput will be appreciably reduced, maybe to a 
few thousands rows per second.

It's really hard to estimate the amount of resources needed by the index due to 
the great variety of indexing and querying ways that Lucene offers, so the only 
thing we can suggest is to empirically find the optimal setup for your use case.

2015-06-12 12:00 GMT+02:00 Carlos Rolo 
r...@pythian.commailto:r...@pythian.com:
Seems like an interesting tool!
What operational recommendations would you make to users of this tool (Extra 
hardware capacity, extra metrics to monitor, etc)?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.comhttp://www.pythian.com/

On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña 
adelap...@stratio.commailto:adelap...@stratio.com wrote:
Unfortunately, we don't have published any benchmarks yet, but we have plans to 
do it as soon as possible. However, you can expect a similar behavior as those 
of Elasticsearch or Solr, with some overhead due to the need for indexing both 
the Cassandra's row key and the partition's token. You can also take a look at 
this 
presentationhttp://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
 to see how cluster distribution is done.

2015-06-12 0:45 GMT+02:00 Ben Bromhead 
b...@instaclustr.commailto:b...@instaclustr.com:
Looks awesome, do you have any examples/benchmarks of using these indexes for 
various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

On 10 June 2015 at 09:08, Andres de la Peña 
adelap...@stratio.commailto:adelap...@stratio.com wrote:
Hi all,

With the release of Cassandra 2.1.6, Stratio is glad to present its open source 
Lucene-based implementation of C* secondary 
indexeshttps://github.com/Stratio/cassandra-lucene-index as a plugin that can 
be attached to Apache Cassandra. Before the above changes, Lucene index was 
distributed inside a fork of Apache Cassandra, with all the difficulties 
implied. As of now, the fork is discontinued and new users should use the 
recently created plugin, which maintains all the features of Stratio 
Cassandrahttps://github.com/Stratio/stratio-cassandra.

Stratio's Lucene index extends Cassandra’s functionality to provide near 
real-time distributed search engine capabilities such as with ElasticSearch or 
Solr, including full text search capabilities, free multivariable search, 
relevance queries and field-based sorting. Each node indexes its own data, so 
high availability and scalability is guaranteed.

We hope this will be useful to the Apache Cassandra community.

Regards,

--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42tel:%2B34%2091%20352%2059%2042 // 
@stratiobdhttps://twitter.com/StratioBD



--

Ben Bromhead

Instaclustr | www.instaclustr.comhttps://www.instaclustr.com/ | 
@instaclustrhttp://twitter.com/instaclustr | (650) 284 9692



--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42tel:%2B34%2091%20352%2059%2042 // 
@stratiobdhttps://twitter.com/StratioBD



--





--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobdhttps://twitter.com/StratioBD


Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Andres de la Peña
I really appreciate your interest

Well, the first recommendation is to not use it unless you need it, because
a properly Cassandra denormalized model is almost always preferable to
indexing. Lucene indexing is a good option when there is no viable
denormalization alternative. This is the case of range queries over
multiple dimensions, full-text search or maybe complex boolean predicates.
It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
total amount of rows in a certain table, if you can pay the cost of
indexing.

Lucene indexes run inside C*, so users should closely monitor the amount of
used memory. It's also a good idea to put the Lucene directory files in a
separate disk to those used by C* itself. Additionally, you should consider
that indexed tables write throughput will be appreciably reduced, maybe to
a few thousands rows per second.

It's really hard to estimate the amount of resources needed by the index
due to the great variety of indexing and querying ways that Lucene offers,
so the only thing we can suggest is to empirically find the optimal setup
for your use case.

2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com:

 Seems like an interesting tool!

 What operational recommendations would you make to users of this tool
 (Extra hardware capacity, extra metrics to monitor, etc)?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com
  wrote:

 Unfortunately, we don't have published any benchmarks yet, but we have
 plans to do it as soon as possible. However, you can expect a similar
 behavior as those of Elasticsearch or Solr, with some overhead due to the
 need for indexing both the Cassandra's row key and the partition's token.
 You can also take a look at this presentation
 http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/
 to see how cluster distribution is done.

 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com:

 Looks awesome, do you have any examples/benchmarks of using these
 indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

 On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com
 wrote:

 Hi all,

 With the release of Cassandra 2.1.6, Stratio is glad to present its
 open source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that
 can be attached to Apache Cassandra. Before the above changes, Lucene index
 was distributed inside a fork of Apache Cassandra, with all the
 difficulties implied. As of now, the fork is discontinued and new users
 should use the recently created plugin, which maintains all the features 
 of Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide
 near real-time distributed search engine capabilities such as with
 ElasticSearch or Solr, including full text search capabilities, free
 multivariable search, relevance queries and field-based sorting. Each node
 indexes its own data, so high availability and scalability is guaranteed.


 We hope this will be useful to the Apache Cassandra community.


 Regards,

 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692




 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*



 --






-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*


Re: Lucene index plugin for Apache Cassandra

2015-06-11 Thread Ben Bromhead
Looks awesome, do you have any examples/benchmarks of using these indexes
for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?

On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote:

 Hi all,

 With the release of Cassandra 2.1.6, Stratio is glad to present its open
 source Lucene-based implementation of C* secondary indexes
 https://github.com/Stratio/cassandra-lucene-index as a plugin that can
 be attached to Apache Cassandra. Before the above changes, Lucene index was
 distributed inside a fork of Apache Cassandra, with all the difficulties
 implied. As of now, the fork is discontinued and new users should use the
 recently created plugin, which maintains all the features of Stratio
 Cassandra https://github.com/Stratio/stratio-cassandra.



 Stratio's Lucene index extends Cassandra’s functionality to provide near
 real-time distributed search engine capabilities such as with ElasticSearch
 or Solr, including full text search capabilities, free multivariable
 search, relevance queries and field-based sorting. Each node indexes its
 own data, so high availability and scalability is guaranteed.


 We hope this will be useful to the Apache Cassandra community.


 Regards,

 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692


Lucene index plugin for Apache Cassandra

2015-06-10 Thread Andres de la Peña
Hi all,

With the release of Cassandra 2.1.6, Stratio is glad to present its open
source Lucene-based implementation of C* secondary indexes
https://github.com/Stratio/cassandra-lucene-index as a plugin that can be
attached to Apache Cassandra. Before the above changes, Lucene index was
distributed inside a fork of Apache Cassandra, with all the difficulties
implied. As of now, the fork is discontinued and new users should use the
recently created plugin, which maintains all the features of Stratio
Cassandra https://github.com/Stratio/stratio-cassandra.



Stratio's Lucene index extends Cassandra’s functionality to provide near
real-time distributed search engine capabilities such as with ElasticSearch
or Solr, including full text search capabilities, free multivariable
search, relevance queries and field-based sorting. Each node indexes its
own data, so high availability and scalability is guaranteed.


We hope this will be useful to the Apache Cassandra community.


Regards,

-- 

Andrés de la Peña


http://www.stratio.com/
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*