"Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally."
On Wed, Feb 9, 2011 at 9:13 AM, <alta...@ceid.upatras.gr> wrote: > One more question: does each node keep an index of their own values, or is > the index global? > > Alexander > >> Thank you very much, this is the information I was looking for. I started >> adding secondary index functionality to Cassandra myself, and it turns out >> I am doing almost exactly the same thing. I will try to change my code to >> use your implementation as well to compare results. >> >> Alexander >> >>> Alexander: >>> >>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a >>> column >>> family, and are kept synchronized with the base data via locking on a >>> local >>> node, meaning they are always consistent on the local node. Eventual >>> consistency still applies between nodes, but a returned result will >>> always >>> match your query. >>> >>> This index column family stores a mapping from index values to a sorted >>> list >>> of matching row keys. When you query for rows between x and y matching a >>> value z (via the get_indexed_slices call), Cassandra performs a lookup >>> to >>> the index column family for the slice of columns in row z between x and >>> y. >>> If any matches are found in the index, they are row keys that match the >>> index clause, and we query the base data to return you those rows. >>> >>> Iterating through all of the rows matching an index clause on your >>> cluster >>> is guaranteed to touch N/RF of the nodes in your cluster, because each >>> node >>> only knows about data that is indexed locally. >>> >>> Some portions of the indexing implementation are not fully baked yet: >>> for >>> instance, although the API allows you to specify multiple columns, only >>> one >>> index will actually be used per query, and the rest of the clauses will >>> be >>> brute forced. >>> >>> A second secondary index implementation has been on the back burner for >>> a >>> while: it provides an identical API, but does not use a column family to >>> store the index, and should be more efficient for append only data. See >>> https://issues.apache.org/jira/browse/CASSANDRA-1472 >>> >>> Thanks, >>> Stu >>> >>> On Wed, Feb 9, 2011 at 2:35 AM, <alta...@ceid.upatras.gr> wrote: >>> >>>> Thank you for the links, I did read a bit in the comments of the >>>> ticket, >>>> but I couldn't get much out of it. >>>> >>>> I am mainly interested in how the index is stored and partitioned, not >>>> how >>>> it is used. I think the people in the dev list will probably be better >>>> qualified to answer that. My questions always seem to get moved to the >>>> user list, and usually with good cause, but I think this time it should >>>> be >>>> in the dev list :) Please move it back, if you can. >>>> >>>> Alexander >>>> >>>> > AFAIK this was the ticket the original work was done under >>>> > https://issues.apache.org/jira/browse/CASSANDRA-1415 >>>> > >>>> > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes >>>> > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help >>>> > >>>> > (sorry on reflection the email prob did not need to be moved from >>>> dev, >>>> my >>>> > bad) >>>> > Aaron >>>> > >>>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <aa...@thelastpickle.com> >>>> wrote: >>>> > >>>> > Moving to the user group. >>>> > >>>> > >>>> > >>>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: >>>> > >>>> > Hello, >>>> > >>>> > I'd like some information about how secondary indices work under the >>>> hood. >>>> > >>>> > 1) Is data stored in some external data structure, or is it stored in >>>> an >>>> > actual Cassandra table, as columns within column families? >>>> > 2) Is data stored sorted or not? How is it partitioned? >>>> > 3) How can I access index data? >>>> > >>>> > Thanks in a advance, >>>> > >>>> > Alexander Altanis >>>> > >>>> >>> >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com