subject:"\[jira\] \[Commented\] \(CASSANDRA\-2915\) Lucene based Secondary Indexes"

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2013-12-16 Thread Matt Stump (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850175#comment-13850175
 ] 

Matt Stump commented on CASSANDRA-2915:
---

Given that the read before write issues still stand for non-numeric fields (as 
of 4.6), is Lucene based secondary indexes still something we want committed in 
the near term? Do we want to wait until incremental update/stacked segments are 
available for all field types?

Additionally, Lucene, even for near realtime search still imposes a delay 
between when a row is added and when it is query-able which would differ from 
existing behavior; is this something that we can live with?

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2013-11-25 Thread Camille Vergara (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832066#comment-13832066
 ] 

Camille Vergara commented on CASSANDRA-2915:


If you're interested in seeing this feature implemented, you should consider 
supporting the fundraiser for bounties on Bountysource: 
https://www.bountysource.com/fundraisers/508.

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2013-11-25 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832125#comment-13832125
 ] 

Alex Liu commented on CASSANDRA-2915:
-

We may need use Twitter's RealTime search.

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index

 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-30 Thread T Jake Luciani (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093748#comment-13093748
]

T Jake Luciani commented on CASSANDRA-2915:
---

bq. I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must.

I don't think supporting GROUP BY and ORDER BY is something we want to support
using secondary indexes. The whole idea of scatter gather in cassandra would
be a performance killer and promote bad data-modeling practices.

The goal of this ticket is to support lucene search features with the current
secondary index api.

We can add LIKE, OR, NOT, BETWEEN with this.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
Labels: secondary_index

Secondary indexes (of type KEYS) suffer from a number of limitations in their
current form:
- Multiple IndexClauses only work when there is a subset of rows under the
highest clause
- One new column family is created per index this means 10 new CFs for 10
secondary indexes
This ticket will use the Lucene library to implement secondary indexes as one
index per CF, and utilize the Lucene query engine to handle multiple index
clauses. Also, by using the Lucene we get a highly optimized file format.
There are a few parallels we can draw between Cassandra and Lucene.
Lucene indexes segments in memory then flushes them to disk so we can sync
our memtable flushes to lucene flushes. Lucene also has optimize() which
correlates to our compaction process, so these can be sync'd as well.
We will also need to correlate column validators to Lucene tokenizers, so the
data can be stored properly, the big win in once this is done we can perform
complex queries within a column like wildcard searches.
The downside of this approach is we will need to read before write since
documents in Lucene are written as complete documents. For random workloads
with lot's of indexed columns this means we need to read the document from
the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-30 Thread Todd Nine (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094116#comment-13094116
]

Todd Nine commented on CASSANDRA-2915:
--

I agree that order by could be a performance killer for large data sets. In
large data sets I think that users should make use of de-normalization and
create their own secondary index for efficient querying. However, on small
data sets, which seem to be very common in web systems (ours is about 80% of
the data a user sees), order by semantics are very important. Most of our data
the user sees has a very small result set, 100 rows. I think explicitly
prohibiting these features limit the user too much. Shouldn't they be
supported and ultimately it is up to the user to determine which approach they
take in implementing index for their data?

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Todd Nine (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093263#comment-13093263
]

Todd Nine commented on CASSANDRA-2915:
--

Could we also use this feature as a standard way for building our lucene
documents? This would accomplish what we want, as well as giving a hook for
more user functionality.

CASSANDRA-1311

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093298#comment-13093298
]

Jason Rutherglen commented on CASSANDRA-2915:
-

Todd,

Another option is to add a [user optional] class that converts raw Cassandra
columns into a Lucene document. Implicitly the Cassandra columns do not need
to map to Lucene document fields. This is more of a slight change in the
user's expectations for CQL rather than a core functional change. Eg, the CQL
submitted to a Lucene secondary index may refer to Lucene fields that do not
exist as columns.

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Ed Anuff (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093341#comment-13093341
]

Ed Anuff commented on CASSANDRA-2915:
-

+1 on having the ability to provide a conversion class for handling
transformations from columns to Lucene documents. It's not uncommon for people
to store objects serialized to JSON or other some other serialization format
into columns. CQL will have to catch up with this practice at some point.

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-29 Thread Todd Nine (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093344#comment-13093344
]

Todd Nine commented on CASSANDRA-2915:
--

I think forcing users to install classes for common use cases would cause
issues with adoption. What about creating new CQL commands to handle this?
When creating an index in a db, you would define the fields and the manner in
which they are indexed. Could we do something like the following?

create index [colname] in [colfamily] using [index type 1] as [indexFieldName],
[index type 2] as [indexFieldName], [index type n] as [indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]

This way clients such as JPA can update and create indexes, without the need to
install custom classes on Cassandra itself. They also have the ability to
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the
1 to many mappings for column to indexing strategy. This would allow more
advanced clients such as the JPA plugin to automatically add indexes to the
document based on indexes defined on persistent fields, without generating any
code the user has to install in the Cassandra runtime. If users want to
install custom analyzers, they still have the option to do so, and would gain
access to it via CQL.

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-28 Thread Todd Nine (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606
]

Todd Nine commented on CASSANDRA-2915:
--

I don't necessaryly think there is a 1 to 1 relationship between a column and a
Lucene document field. In our case we have the need to index fields in more
than one manner. For instance, we index users as straight strings (lowercased)
with email, first name and last name columns. However we also want to tokenize
the email, first and last name columns to allow our customer support people to
perform partial name matching. I think a 1 to N mapping is required for column
to document field to allow this sort of functionality.

As far as expiration on columns, is there a system event that we can hook into
to just force a document reindex when a column expires rather than add an
additional field that will need to be sorted from?

As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT,
LIKE etc are a must. Most users have become accustomed to this functionality
with RDBMS. If they cause potential performance problems, I think this should
be documented so that users have enough information to determine if they can
rely on the Lucene index or should build their own index directly.

Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help?

Lucene based Secondary Indexes
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-11 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083099#comment-13083099
 ] 

T Jake Luciani commented on CASSANDRA-2915:
---

Under the CF dir I imagine

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Jason Rutherglen
  Labels: secondary_index
 Fix For: 1.0


 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082878#comment-13082878
 ] 

Jason Rutherglen commented on CASSANDRA-2915:
-

Which physical directory do we want to place the Lucene indexes?

 Lucene based Secondary Indexes
 --

 Key: CASSANDRA-2915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
  Labels: secondary_index
 Fix For: 1.0


 Secondary indexes (of type KEYS) suffer from a number of limitations in their 
 current form:
- Multiple IndexClauses only work when there is a subset of rows under the 
 highest clause
- One new column family is created per index this means 10 new CFs for 10 
 secondary indexes
 This ticket will use the Lucene library to implement secondary indexes as one 
 index per CF, and utilize the Lucene query engine to handle multiple index 
 clauses. Also, by using the Lucene we get a highly optimized file format.
 There are a few parallels we can draw between Cassandra and Lucene.
 Lucene indexes segments in memory then flushes them to disk so we can sync 
 our memtable flushes to lucene flushes. Lucene also has optimize() which 
 correlates to our compaction process, so these can be sync'd as well.
 We will also need to correlate column validators to Lucene tokenizers, so the 
 data can be stored properly, the big win in once this is done we can perform 
 complex queries within a column like wildcard searches.
 The downside of this approach is we will need to read before write since 
 documents in Lucene are written as complete documents. For random workloads 
 with lot's of indexed columns this means we need to read the document from 
 the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-05 Thread T Jake Luciani (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079952#comment-13079952
]

T Jake Luciani commented on CASSANDRA-2915:
---

Another issue we need to work around is Expiring columns... We could store the
expiration time in the document and make it a constraint on the lucene query so
we don't pull expired data.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-05 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080002#comment-13080002
]

Jason Rutherglen commented on CASSANDRA-2915:
-

I think it's important to note all of the many SQL'like features Lucene has
[now].

ORDER BY, GROUP BY, COUNT / facet, AND / OR queries, LIKE. This makes Lucene
ideal for CQL and it's goals.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-05 Thread Ryan King (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080008#comment-13080008
]

Ryan King commented on CASSANDRA-2915:
--

Regarding realtime search, hasn't our (twitter's) realtime search branch been
merged into lucene trunk? Whenever that's available we should get real realtime
results.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-05 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080012#comment-13080012
]

Jason Rutherglen commented on CASSANDRA-2915:
-

bq. Regarding realtime search, hasn't our (twitter's) realtime search branch
been merged into lucene trunk?

There's LUCENE-2312. Twitter's RT search is highly specialized (yes I'm
familiar with it), eg, Lucene is far too general (think of payloads, phrase
queries, span queries, etc) for the code Twitter has to be merged into. If
Twitter's search were to be integrated, there would be an awful lot of
refactoring of Lucene required.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079489#comment-13079489
]

Jason Rutherglen commented on CASSANDRA-2915:
-

I looked at MessagingService which seems to be more [custom] asynchronous?

I think we could offer a Thrift API? What does CQL use?

I think we'd want to look towards making this [Lucene] play well / integrate
with CQL?

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079682#comment-13079682
]

Jason Rutherglen commented on CASSANDRA-2915:
-

bq. Would using dynamic composites within cassandra alleviate the need for
Lucene documents?

I think it is hard to duplicate the efficiency of Lucene for dis/conjunction
queries (OR / AND), especially with PFOR implemented (a CPU directed enhanced
system for decoding integers on todays microprocessors).

We can/will turn off scoring which further makes Lucene a straight query
execution engine, as opposed to a free text search engine. Range queries in
Lucene use a trie system which is highly effective.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079684#comment-13079684
]

Jason Rutherglen commented on CASSANDRA-2915:
-

I think the open design question on this one is distributed search, and how a
distributed search client will know which Cassandra servers to send a query to.
Meaning, traditionally a query is sent to N servers whose responses are merged
and X results are returned. We can send a query to all servers however I think
we'd then have duplicate rows/documents returned. How does CQL handle this?

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Todd Nine (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079695#comment-13079695
]

Todd Nine commented on CASSANDRA-2915:
--

I'm quite keen to contribute on this issue, as this will greatly enhance the
functionality of the hector-jpa project. If I can contribute any work, please
let me know.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread T Jake Luciani (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079712#comment-13079712
]

T Jake Luciani commented on CASSANDRA-2915:
---

Todd: once CASSANDRA-2982 is done we can get started. I'm trying to focus on
that right now. In the meantime we need to think of how to link lucene
analyzers to column_metadata.

Jason: This currently works by executing the query locally if that does not
have enough results it moves on to the next node. since the ring is split we
know the range of keys to restrict the search to. this avoids dups

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079735#comment-13079735
]

Jason Rutherglen commented on CASSANDRA-2915:
-

bq. like getLuceneAnalyzer()

There won't always be a 1 to 1 mapping of a column to a field. For example in
Solr, there is copy field, which essentially creates a new field. Also
Analyzer is for any field, the right per-field class would be Tokenizer.

I strongly believe we need to have an interface that accepts a row and
essentially generates a Lucene Document. This should be the most
straightforward approach that enables just about anything, including using a
Solr schema at some point.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread T Jake Luciani (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079745#comment-13079745
]

T Jake Luciani commented on CASSANDRA-2915:
---

bq. Will read after write be available? I.E if your mutation for the row key
returns to the client, then the row now has an entry in the Lucence index,
which can immediately be queried to return the results.

Yes. We can use a RAMDirectory() to keep writes real-time.

bq. What about durability, in the event cassandra crashes, will the Lucene
index retain these indexed values, or will they be lost if commit is not
invoked on the index?

When the memtable is flushed. we will merge the RAMDirectory index into the
FSDirectory index and call reopen().

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-08-04 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079776#comment-13079776
]

Jason Rutherglen commented on CASSANDRA-2915:
-

bq. Yes. We can use a RAMDirectory() to keep writes real-time.

LUCENE-3092 implemented NRTCachingDirectory which we can use for in RAM NRT
until LUCENE-2312 is completed.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-07-28 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072582#comment-13072582
]

Jonathan Ellis commented on CASSANDRA-2915:
---

Yes. Look at uses of MessagingService.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-07-18 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067205#comment-13067205
]

Jason Rutherglen commented on CASSANDRA-2915:
-

Jake, this looks good. We need to specify how configuration parameters are
passed into the Lucene secondary index. This needs to include things like the
local Lucene file path, a class to transform Cassandra CF rows into Lucene
documents, etc.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-07-18 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067224#comment-13067224
]

Jonathan Ellis commented on CASSANDRA-2915:
---

bq. For random workloads with lot's of indexed columns this means we need to
read the document from the index, update it and write it back.

Could we go for a deeper level of integration? Instead of storing the data
twice as Cassandra row + Lucene document, use the row as the document Source Of
Truth, and just let Lucene handle the indexes?

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-07-18 Thread T Jake Luciani (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067228#comment-13067228
]

T Jake Luciani commented on CASSANDRA-2915:
---

bq. We need to specify how configuration parameters are passed into the Lucene
secondary index. This needs to include things like the local Lucene file path,
a class to transform Cassandra CF rows into Lucene documents, etc.

The secondary indexes would go into the data directory defined in
cassandra.yaml, currently there is a dir per KeySpace, we can create a subdir
like indexes were the lucene indexes are stored.

As for transforms, I mentioned column validators. This is meta information
about the contents of columns, see
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

This validation_class can be extended to let users map columns to lucene
analyzer.

The document would be a row: fields would be columns (with analyzers specified
in the column meta-data validation_class)

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

2011-07-18 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067234#comment-13067234
]

Jonathan Ellis commented on CASSANDRA-2915:
---

Right. I didn't mean to imply this solves read-before-write, only that I'd
like to avoid writing two copies of the base data.

Lucene based Secondary Indexes
--

Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Labels: secondary_index
Fix For: 1.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

29 matches

Site Navigation

Mail list logo

Footer information