Re: CQL Select Map using an IN relationship

Laing, Michael Thu, 13 Mar 2014 10:40:42 -0700

Think of them as:

PRIMARY KEY (partition_key[, range_key])


where the partition_key can be compounded as:

(partition_key0 [, partition_key1, ...])

and the optional range_key can be compounded as:

range_key0 [, range_key1 ...]

If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
and key2 is the range_key and queries will work that hash to key1 (the
partition) using = or IN and specify a range on key2.

But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
compound partition key - there is no range key - and you can specify = on
key1 and = or IN on key2 (but not a range).

Anyway that's what I remember! Hope it helps.

ml


On Thu, Mar 13, 2014 at 11:27 AM, David Savage <davemssav...@gmail.com>wrote:

> Great that works, thx! I probably would have never found that...
>
> It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
> PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
> time.
>
> Kind regards,
>
> Dave
>
>
> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael <michael.la...@nytimes.com
> > wrote:
>
>> Create your table like this and it will work:
>>
>> CREATE TABLE test.documents (group text,id bigint,data
>> map<text,text>,PRIMARY KEY ((group, id)));
>>
>> The extra parens catenate 'group' and 'id' into the partition key - IN
>> will work on the last component of a partition key.
>>
>> ml
>>
>>
>> On Thu, Mar 13, 2014 at 10:40 AM, David Savage <davemssav...@gmail.com>wrote:
>>
>>> Nope, upgraded to 2.0.5 and still get the same problem, I actually
>>> simplified the problem a little in my first post, there's a composite
>>> primary key involved as I need to partition ids into groups
>>>
>>> So the full CQL statements are:
>>>
>>> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
>>> 'replication_factor':3};
>>>
>>>
>>> CREATE TABLE test.documents (group text,id bigint,data
>>> map<text,text>,PRIMARY KEY (group, id));
>>>
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (0,'test',{'count':'0'});
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (1,'test',{'count':'1'});
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (2,'test',{'count':'2'});
>>>
>>>
>>> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);
>>>
>>>
>>> Thanks for your help.
>>>
>>>
>>> Kind regards,
>>>
>>>
>>> /Dave
>>>
>>>
>>> On Thu, Mar 13, 2014 at 2:00 PM, David Savage <davemssav...@gmail.com>wrote:
>>>
>>>> Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
>>>> dragged in by the cassandra unit library I'm using for testing [1] I will
>>>> try to fix my build dependencies and retry, thx.
>>>>
>>>> /Dave
>>>>
>>>> [1] https://github.com/jsevellec/cassandra-unit
>>>>
>>>>
>>>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <
>>>> michael.la...@nytimes.com> wrote:
>>>>
>>>>> I have no problem doing this w 2.0.5 - what version of C* are you
>>>>> using? Or maybe I don't understand your data model... attach 'creates' if
>>>>> you don't mind.
>>>>>
>>>>> ml
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 9:24 AM, David Savage 
>>>>> <davemssav...@gmail.com>wrote:
>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> Thanks for the help, unfortunately I'm not sure that's the problem,
>>>>>> the id is the primary key on the documents table and the timestamp
>>>>>> is the primary key on the eventlog table
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> On Thursday, 13 March 2014, Peter Lin <wool...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> it's not clear to me if your "id" column is the KEY or just a
>>>>>>> regular column with secondary index.
>>>>>>>
>>>>>>> queries that have IN on non primary key columns isn't supported yet.
>>>>>>> not sure if that answers your question.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage <
>>>>>>> davemssav...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> I'm experimenting using cassandra and have run across an error
>>>>>>>> message which I need a little more information on.
>>>>>>>>
>>>>>>>> The use case I'm experimenting with is a series of document updates
>>>>>>>> (documents being an arbitrary map of key value pairs), I would like to 
>>>>>>>> find
>>>>>>>> the latest document updates after a specified time period. I don't 
>>>>>>>> want to
>>>>>>>> store many copies of the documents (one per update) as the updates are
>>>>>>>> often only to single keys in the map so that would involve a lot of
>>>>>>>> duplicated data.
>>>>>>>>
>>>>>>>> The solution I've found that seems to fit best in terms of
>>>>>>>> performance is to have two tables.
>>>>>>>>
>>>>>>>> One that has an event log of timeuuid -> docid and a second that
>>>>>>>> stores the documents themselves stored by docid -> map<string, 
>>>>>>>> string>. I
>>>>>>>> then run two queries, one to select ids that have changed after a 
>>>>>>>> certain
>>>>>>>> time:
>>>>>>>>
>>>>>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime)
>>>>>>>>
>>>>>>>> and then a second to select the actual documents themselves
>>>>>>>>
>>>>>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)
>>>>>>>>
>>>>>>>> However this then explodes on query with the error message:
>>>>>>>>
>>>>>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a collection
>>>>>>>> is selected by the query"
>>>>>>>>
>>>>>>>> Detective work lead me to these lines in
>>>>>>>> org.apache.cassandra.cql3.statementsSelectStatement:
>>>>>>>>
>>>>>>>>                     // We only support IN for the last name and for
>>>>>>>> compact storage so far
>>>>>>>>                     // TODO: #3885 allows us to extend to non
>>>>>>>> compact as well, but that remains to be done
>>>>>>>>                     if (i != stmt.columnRestrictions.length - 1)
>>>>>>>>                         throw new
>>>>>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be
>>>>>>>> restricted by IN relation", cname));
>>>>>>>>                     else if (stmt.selectACollection())
>>>>>>>>                         throw new
>>>>>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY 
>>>>>>>> part %s
>>>>>>>> by IN relation as a collection is selected by the query", cname));
>>>>>>>>
>>>>>>>> It seems like #3885 will allow support for the first IF block
>>>>>>>> above, but I don't think it will allow the second, am I correct?
>>>>>>>>
>>>>>>>> Any pointers on how I can work around this would be greatly
>>>>>>>> appreciated.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>>
>>>>>>>> Dave
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: CQL Select Map using an IN relationship

Reply via email to