Re: Document storage

2012-03-30 Thread Daniel Doubleday
seems to be contrary to all our tests / learnings. Cheers, Daniel From dev list: Re: Document storage On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects

Re: Document storage

2012-03-30 Thread Brian O'Neill
, Daniel From dev list: Re: Document storage On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs

Re: Document storage

2012-03-30 Thread Sylvain Lebresne
not resist because your statement seems to be contrary to all our tests / learnings. Cheers, Daniel From dev list: Re: Document storage On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability

Re: Document storage

2012-03-30 Thread Ben McCann
your statement seems to be contrary to all our tests / learnings. Cheers, Daniel From dev list: Re: Document storage On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability to update

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I'm actually doing something almost the same. I serialize my objects into byte[] using Jackson's SMILE format, then compress it using Snappy then store the byte[] in Cassandra. I actually created a simple Cassandra Type for this but I hit a wall with cassandra-cli:

Re: Document storage

2012-03-29 Thread Ben McCann
Sounds awesome Drew. Mind sharing your custom type? I just wrote a basic JSON type and did the validation the same way you did, but I don't have any SMILE support yet. It seems that if your type were committed to the Cassandra codebase then the issue you ran into of the CLI only supporting

Re: Document storage

2012-03-29 Thread Jake Luciani
Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access and update portions

Re: Document storage

2012-03-29 Thread Ben McCann
Could you explain further how I would use CASSANDRA-3647? There's still very little documentation on composite columns and it was not clear to me whether they could be used to store document oriented data. Say for example that I had a document like: user: { firstName: 'ben', skills:

Re: Document storage

2012-03-29 Thread Rick Branson
Ben, You can create a materialized path for each field in the document: { [user, firstName]: ben, [user, skills, TimeUUID]: java, [user, skills, TimeUUID]: javascript, [user, skills, TimeUUID]: html, [user, education, school]: cmu, [user, education, major]: computer science } This way each

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
. -Jeremiah From: Jake Luciani [jak...@gmail.com] Sent: Thursday, March 29, 2012 7:44 AM To: dev@cassandra.apache.org Subject: Re: Document storage Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
: Document storage Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access and update

Re: Document storage

2012-03-29 Thread Tyler Patterson
Would there be interest in adding a JsonType? What about checking that data inserted into a JsonType is valid JSON? How would you do it, and would the overhead be something we are concerned about, especially if the JSON string is large?

Re: Document storage

2012-03-29 Thread Ben McCann
Creating materialized paths may well be a possible solution. If that were the solution the community were to agree upon then I would like it to be a standardized and well-documented best practice. I asked how to store a list of values on the user

Re: Document storage

2012-03-29 Thread Edward Capriolo
The issue with these super complex types is to do anything useful with them you would either need scanners or co processors. As its stands right now complex data like json is fairly opaque to Cassandra. Getting cassandra to natively speak protobuffs or whatever flavor of the week serialization

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Its not clear what 3647 actually is, there is no code attached, and no real example in it. Aside from that, the reason this would be useful to me (if we could get indexing of attributes working), is that

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
the majority of the data I pulled back that doesn't belong to o1 and o5 -Jeremiah From: Jonathan Ellis [jbel...@gmail.com] Sent: Thursday, March 29, 2012 11:23 AM To: dev@cassandra.apache.org Subject: Re: Document storage On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Hi Ben, Sure, there's nothing really to it, but I'll email it to you. As far as why I'm using Snappy on the type instead of sstable_compression is because when you set sstable_compression the compression happens on the Cassandra nodes and I see two advantages with my approach: 1. Saving extra

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I agree with Edward here, the simpler we keep the core the better. I think all the ser/deser and conversions should happen on the client side. -- Drew On Mar 29, 2012, at 8:36 AM, Edward Capriolo wrote: The issue with these super complex types is to do anything useful with them you would

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs with a bunch of special case logic to introspect them. Which feels like a big step backwards to me. Unless your

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote: I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs with a bunch of special case logic to

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Yes, I meant the row header index. What I have done is that I'm storing an object (i.e. UserProfile) where you read or write it as a whole (a user updates their user details in a single page in the UI). So I serialize that object into a binary JSON using SMILE format. I then compress it using

Re: Document storage

2012-03-29 Thread Ben McCann
Jonathan, I asked Brian about his REST APIhttps://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas9C8Usand he said he does not take the json objects and split them because the client libraries do not agree on implementations. This was exactly my concern as well with this solution.

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 2:06 PM, Ben McCann b...@benmccann.com wrote:  As far as I can tell, Cassandra doesn't support maps and lists in a standardized way today, which is the root of my problem. I'm pretty serious about adding those for 1.2, for what that's worth. (If you want to jump in and

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, I was actually going to take this up with Nate McCall a few weeks back. I think it might make sense to get the client development community together (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.) I agree whole-heartedly that it shouldn't go into the database for all the reasons

Re: Document storage

2012-03-29 Thread Ben McCann
Thanks Jonathan. The only reason I suggested JSON was because it already has support for lists. Native support for lists in Cassandra would more than satisfy me. Are there any existing proposals or a bug I can follow? I'm not familiar with the Cassandra codebase, so I'm not entirely sure how

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, We store JSON as our column values. I'd love to see support for maps and lists. If I get some time this weekend, I'll take a look to see what is required. I doesn't seem like it would be that hard. -brian Brian O'Neill Lead Architect, Software Development Health Market

Re: Document storage

2012-03-29 Thread Jonathan Ellis
I kind of hijacked https://issues.apache.org/jira/browse/CASSANDRA-3647 (Sylvain suggests we start with (non-nested) lists, maps, and sets. I agree that this is a great 80/20 approach to the problem) but we could split it out to another ticket. On Thu, Mar 29, 2012 at 2:24 PM, Ben McCann

Re: Document storage

2012-03-29 Thread Ben McCann
Cool. How were you thinking we should store the data? As a stanardized composite column (e.g. potentially a list as [fieldName, TimeUUID]: fieldValue and a set as [fieldName, fieldValue ]:)? Or as a new column type? On Thu, Mar 29, 2012 at 12:35 PM, Jonathan Ellis jbel...@gmail.com wrote:

Re: Document storage

2012-03-28 Thread Ben McCann
Any thoughts? I'd like to submit a patch, but only if it will be accepted. Thanks, Ben On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann b...@benmccann.com wrote: Hi, I was wondering if it would be interesting to add some type of document-oriented data type. I've found it somewhat awkward to

Re: Document storage

2012-03-28 Thread Jeremy Hanna
I don't speak for the project, but you might give it a day or two for people to respond and/or perhaps create a jira ticket. Seems like that's a reasonable data type that would get some traction - a json type. However, what would validation look like? That's one of the main reasons there are

Re: Document storage

2012-03-28 Thread Jeremiah Jordan
Sounds interesting to me. I looked into adding protocol buffer support at one point, and it didn't look like it would be too much work. The tricky part was I also wanted to add indexing support for attributes of the inserted protocol buffers. That looked a little trickier, but still not

Re: Document storage

2012-03-28 Thread Tatu Saloranta
On Wed, Mar 28, 2012 at 6:59 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Sounds interesting to me.  I looked into adding protocol buffer support at one point, and it didn't look like it would be too much work.  The tricky part was I also wanted to add indexing support for

Re: Document storage

2012-03-28 Thread Edward Capriolo
Some work I did stores JSON blobs in columns. The question on JSON type is how to sort it. On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I don't speak for the project, but you might give it a day or two for people to respond and/or perhaps create a jira

Re: Document storage

2012-03-28 Thread Ben McCann
I don't imagine sort is a meaningful operation on JSON data. As long as the sorting is consistent I would think that should be sufficient. On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Some work I did stores JSON blobs in columns. The question on JSON type is