Re: Document storage

2012-03-29 Thread Drew Kutcharian
I'm actually doing something almost the same. I serialize my objects into byte[] using Jackson's SMILE format, then compress it using Snappy then store the byte[] in Cassandra. I actually created a simple Cassandra Type for this but I hit a wall with cassandra-cli:

Re: Document storage

2012-03-29 Thread Ben McCann
Sounds awesome Drew. Mind sharing your custom type? I just wrote a basic JSON type and did the validation the same way you did, but I don't have any SMILE support yet. It seems that if your type were committed to the Cassandra codebase then the issue you ran into of the CLI only supporting

Re: Document storage

2012-03-29 Thread Jake Luciani
Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access and update portions

Re: Document storage

2012-03-29 Thread Ben McCann
Could you explain further how I would use CASSANDRA-3647? There's still very little documentation on composite columns and it was not clear to me whether they could be used to store document oriented data. Say for example that I had a document like: user: { firstName: 'ben', skills:

Re: Document storage

2012-03-29 Thread Rick Branson
Ben, You can create a materialized path for each field in the document: { [user, firstName]: ben, [user, skills, TimeUUID]: java, [user, skills, TimeUUID]: javascript, [user, skills, TimeUUID]: html, [user, education, school]: cmu, [user, education, major]: computer science } This way each

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
Its not clear what 3647 actually is, there is no code attached, and no real example in it. Aside from that, the reason this would be useful to me (if we could get indexing of attributes working), is that I already have my data in JSON/Thrift/ProtoBuff, depending how large the data is, it isn't

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
Its not clear what 3647 actually is, there is no code attached, and no real example in it. Aside from that, the reason this would be useful to me (if we could get indexing of attributes working), is that I already have my data in JSON/Thrift/ProtoBuff, depending how large the data is, it isn't

Re: Document storage

2012-03-29 Thread Tyler Patterson
Would there be interest in adding a JsonType? What about checking that data inserted into a JsonType is valid JSON? How would you do it, and would the overhead be something we are concerned about, especially if the JSON string is large?

Re: Document storage

2012-03-29 Thread Ben McCann
Creating materialized paths may well be a possible solution. If that were the solution the community were to agree upon then I would like it to be a standardized and well-documented best practice. I asked how to store a list of values on the user

Re: Document storage

2012-03-29 Thread Edward Capriolo
The issue with these super complex types is to do anything useful with them you would either need scanners or co processors. As its stands right now complex data like json is fairly opaque to Cassandra. Getting cassandra to natively speak protobuffs or whatever flavor of the week serialization

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Its not clear what 3647 actually is, there is no code attached, and no real example in it. Aside from that, the reason this would be useful to me (if we could get indexing of attributes working), is that

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
But it isn't special case logic. The current AbstractType and Indexing of Abstract types for the most part would already support this. Someone just has to write the code for JSONType or ProtoBuffType. The problem isn't writing the code to break objects up, the problem is encode/decode time.

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Hi Ben, Sure, there's nothing really to it, but I'll email it to you. As far as why I'm using Snappy on the type instead of sstable_compression is because when you set sstable_compression the compression happens on the Cassandra nodes and I see two advantages with my approach: 1. Saving extra

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I agree with Edward here, the simpler we keep the core the better. I think all the ser/deser and conversions should happen on the client side. -- Drew On Mar 29, 2012, at 8:36 AM, Edward Capriolo wrote: The issue with these super complex types is to do anything useful with them you would

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs with a bunch of special case logic to introspect them. Which feels like a big step backwards to me. Unless your

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs with a bunch of special case logic to

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Yes, I meant the row header index. What I have done is that I'm storing an object (i.e. UserProfile) where you read or write it as a whole (a user updates their user details in a single page in the UI). So I serialize that object into a binary JSON using SMILE format. I then compress it using

Re: Document storage

2012-03-29 Thread Ben McCann
Jonathan, I asked Brian about his REST APIhttps://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas9C8Usand he said he does not take the json objects and split them because the client libraries do not agree on implementations. This was exactly my concern as well with this solution.

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 2:06 PM, Ben McCann b...@benmccann.com wrote:  As far as I can tell, Cassandra doesn't support maps and lists in a standardized way today, which is the root of my problem. I'm pretty serious about adding those for 1.2, for what that's worth. (If you want to jump in and

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, I was actually going to take this up with Nate McCall a few weeks back. I think it might make sense to get the client development community together (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.) I agree whole-heartedly that it shouldn't go into the database for all the reasons

Re: Document storage

2012-03-29 Thread Ben McCann
Thanks Jonathan. The only reason I suggested JSON was because it already has support for lists. Native support for lists in Cassandra would more than satisfy me. Are there any existing proposals or a bug I can follow? I'm not familiar with the Cassandra codebase, so I'm not entirely sure how

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, We store JSON as our column values. I'd love to see support for maps and lists. If I get some time this weekend, I'll take a look to see what is required. I doesn't seem like it would be that hard. -brian Brian O'Neill Lead Architect, Software Development Health Market

Re: Document storage

2012-03-29 Thread Jonathan Ellis
I kind of hijacked https://issues.apache.org/jira/browse/CASSANDRA-3647 (Sylvain suggests we start with (non-nested) lists, maps, and sets. I agree that this is a great 80/20 approach to the problem) but we could split it out to another ticket. On Thu, Mar 29, 2012 at 2:24 PM, Ben McCann

Re: Document storage

2012-03-29 Thread Ben McCann
Cool. How were you thinking we should store the data? As a stanardized composite column (e.g. potentially a list as [fieldName, TimeUUID]: fieldValue and a set as [fieldName, fieldValue ]:)? Or as a new column type? On Thu, Mar 29, 2012 at 12:35 PM, Jonathan Ellis jbel...@gmail.com wrote: