> Which would lead to the storage engine storing rows similar to this (AFAIK): > > 8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d > - basic_info:propertyset_val = { firstName:"john", lastName:"smith", ...} > - contact_info:propertyset_val = { address: {streetAddr:"1 infinite loop", > postalCode: ""}, ... } > - meal_prefs:propertyset_val = { … } If the internal layout ye wish to see, cast ye fate into the cassandra-cli
cqlsh:dev> insert into user (id, propertyset_key, propertyset_val) values (8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d, 'monkey', 'purple'); cqlsh:dev> select * from user; id | propertyset_key | propertyset_val --------------------------------------+-----------------+----------------- 8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d | monkey | purple CLI... [default@dev] get user[8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d]; => (column=monkey:, value=, timestamp=1365482931142000) => (column=monkey:propertyset_val, value=707572706c65, timestamp=1365482931142000) Returned 2 results. Elapsed time: 32 msec(s). [default@dev] The extra column is there to scare away the range ghosts that come and stay and stare at you in the night. [default@dev] get user2[8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d]; => (column=, value=, timestamp=1365492971141000) => (column=property_sets:6d6f6e6b6579, value=707572706c65, timestamp=1365492971141000) Returned 2 results. > so are these user table definitions equivalent from the storage engine > perspective? > but is it possible to accomplish that somehow with the first definition? You can cqlsh:dev> select * from user where id = 8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d and propertyset_key = 'monkey'; id | propertyset_key | propertyset_val --------------------------------------+-----------------+----------------- 8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d | monkey | purple But if you try to do it without the primary key are asking for a range scan and will be scolded. cqlsh:dev> select * from user where propertyset_key = 'monkey'; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING > Or should I create separate column families for static and dynamic properties > instead? One downside to collection types is that they must be read in their entirety, I'd in this case I would avoid them so you can read particular items. CQL 3 does not do well with mixed static / dynamic tables, if you can make a static CF and select against it specifying the columns you want you'll get the best performance. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/04/2013, at 8:11 PM, Marko Asplund <marko.aspl...@gmail.com> wrote: > Hi, > > I'm currently designing a backend service that would store user profile > information for different applications. Most of the properties in a user > profile would be unknown to the service and specified by the applications > using the service, so the properties would need to be added dynamically. > > I was planning to use CQL3 and a dynamic column family defined something like > this: > > CREATE TABLE user ( > id UUID, > propertyset_key TEXT, > propertyset_val TEXT, > PRIMARY KEY (id, propertyset_key) > ); > > There would be N (assuming < 50) property sets associated with a user. > The property set values would be complex object graphs represented as JSON. > > Which would lead to the storage engine storing rows similar to this (AFAIK): > > 8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d > - basic_info:propertyset_val = { firstName:"john", lastName:"smith", ...} > - contact_info:propertyset_val = { address: {streetAddr:"1 infinite loop", > postalCode: ""}, ... } > - meal_prefs:propertyset_val = { ... } > - ... > > Any comments on this design? > > Another option would be to use the Cassandra map type for storing property > sets like this: > > CREATE TABLE user ( > id UUID, > property_sets MAP<text, text>, > PRIMARY KEY (id) > ); > > Based on the documentation I understood that each map element would > internally be stored as separate a column, so are these user table > definitions equivalent from the storage engine perspective? > > I'm using Astyanax which seems to support Cassandra collections. > > With the second definition, it should be possible to later migrate a dynamic > property e.g. job_title to a static property, so that I could execute CQL > queries like this: > > SELECT * FROM user WHERE job_title = 'developer'; > > but is it possible to accomplish that somehow with the first definition? > Or should I create separate column families for static and dynamic properties > instead? > > > marko