Re: How do secondary indices work

Jonathan Ellis Wed, 09 Feb 2011 08:53:11 -0800

"Iterating through all of the rows matching an index clause on your
cluster is guaranteed to touch N/RF of the nodes in your cluster,
because each node only knows about data that is indexed locally."


On Wed, Feb 9, 2011 at 9:13 AM,  <alta...@ceid.upatras.gr> wrote:
> One more question: does each node keep an index of their own values, or is
> the index global?
>
> Alexander
>
>> Thank you very much, this is the information I was looking for. I started
>> adding secondary index functionality to Cassandra myself, and it turns out
>> I am doing almost exactly the same thing. I will try to change my code to
>> use your implementation as well to compare results.
>>
>> Alexander
>>
>>> Alexander:
>>>
>>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
>>> column
>>> family, and are kept synchronized with the base data via locking on a
>>> local
>>> node, meaning they are always consistent on the local node. Eventual
>>> consistency still applies between nodes, but a returned result will
>>> always
>>> match your query.
>>>
>>> This index column family stores a mapping from index values to a sorted
>>> list
>>> of matching row keys. When you query for rows between x and y matching a
>>> value z (via the get_indexed_slices call), Cassandra performs a lookup
>>> to
>>> the index column family for the slice of columns in row z between x and
>>> y.
>>> If any matches are found in the index, they are row keys that match the
>>> index clause, and we query the base data to return you those rows.
>>>
>>> Iterating through all of the rows matching an index clause on your
>>> cluster
>>> is guaranteed to touch N/RF of the nodes in your cluster, because each
>>> node
>>> only knows about data that is indexed locally.
>>>
>>> Some portions of the indexing implementation are not fully baked yet:
>>> for
>>> instance, although the API allows you to specify multiple columns, only
>>> one
>>> index will actually be used per query, and the rest of the clauses will
>>> be
>>> brute forced.
>>>
>>> A second secondary index implementation has been on the back burner for
>>> a
>>> while: it provides an identical API, but does not use a column family to
>>> store the index, and should be more efficient for append only data. See
>>> https://issues.apache.org/jira/browse/CASSANDRA-1472
>>>
>>> Thanks,
>>> Stu
>>>
>>> On Wed, Feb 9, 2011 at 2:35 AM, <alta...@ceid.upatras.gr> wrote:
>>>
>>>> Thank you for the links, I did read a bit in the comments of the
>>>> ticket,
>>>> but I couldn't get much out of it.
>>>>
>>>> I am mainly interested in how the index is stored and partitioned, not
>>>> how
>>>> it is used. I think the people in the dev list will probably be better
>>>> qualified to answer that. My questions always seem to get moved to the
>>>> user list, and usually with good cause, but I think this time it should
>>>> be
>>>> in the dev list :) Please move it back, if you can.
>>>>
>>>> Alexander
>>>>
>>>> > AFAIK this was the ticket the original work was done under
>>>> > https://issues.apache.org/jira/browse/CASSANDRA-1415
>>>> >
>>>> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
>>>> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>>>> >
>>>> > (sorry on reflection the email prob did not need to be moved from
>>>> dev,
>>>> my
>>>> > bad)
>>>> > Aaron
>>>> >
>>>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <aa...@thelastpickle.com>
>>>> wrote:
>>>> >
>>>> > Moving to the user group.
>>>> >
>>>> >
>>>> >
>>>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > I'd like some information about how secondary indices work under the
>>>> hood.
>>>> >
>>>> > 1) Is data stored in some external data structure, or is it stored in
>>>> an
>>>> > actual Cassandra table, as columns within column families?
>>>> > 2) Is data stored sorted or not? How is it partitioned?
>>>> > 3) How can I access index data?
>>>> >
>>>> > Thanks in a advance,
>>>> >
>>>> > Alexander Altanis
>>>> >
>>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: How do secondary indices work

Reply via email to