Re: Document storage

2012-05-28 Thread Brian O'Neill
something like the map and list > > idea of CASSANDRA-3647 will probably be a more natural fit to the > > current CQL API. > > > > -- > > Sylvain > > > > > > > > All these reads make the hot dataset. If it fits the page cache your > > fine. If i

Re: Document storage

2012-03-30 Thread Ben McCann
> > > > All these reads make the hot dataset. If it fits the page cache your > fine. If it doesn't you need to buy more iron. > > > > Really could not resist because your statement seems to be contrary to > all our tests / learnings. > > > > Cheers, > &

Re: Document storage

2012-03-30 Thread Sylvain Lebresne
n't you need to buy more iron. > > Really could not resist because your statement seems to be contrary to all > our tests / learnings. > > Cheers, > Daniel > > From dev list: > > Re: Document storage > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian wrote

Re: Document storage

2012-03-30 Thread Brian O'Neill
. > >Really could not resist because your statement seems to be contrary to >all our tests / learnings. > >Cheers, >Daniel > >From dev list: > >Re: Document storage >On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian wrote: >>> I think this is a much better ap

Re: Document storage

2012-03-30 Thread Daniel Doubleday
Really could not resist because your statement seems to be contrary to all our tests / learnings. Cheers, Daniel From dev list: Re: Document storage On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian wrote: >> I think this is a much better approach because that gives you the >> ab

Re: Document storage

2012-03-29 Thread Ben McCann
Cool. How were you thinking we should store the data? As a stanardized composite column (e.g. potentially a list as ["fieldName", ]: "fieldValue" and a set as ["fieldName", "fieldValue" ]:"")? Or as a new column type? On Thu, Mar 29, 2012 at 12:35 PM, Jonathan Ellis wrote: > I kind of hija

Re: Document storage

2012-03-29 Thread Jonathan Ellis
I kind of hijacked https://issues.apache.org/jira/browse/CASSANDRA-3647 ("Sylvain suggests we start with (non-nested) lists, maps, and sets. I agree that this is a great 80/20 approach to the problem") but we could split it out to another ticket. On Thu, Mar 29, 2012 at 2:24 PM, Ben McCann wrote:

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, We store JSON as our column values. I'd love to see support for maps and lists. If I get some time this weekend, I'll take a look to see what is required. I doesn't seem like it would be that hard. -brian Brian O'Neill Lead Architect, Software Development Health Market Scienc

Re: Document storage

2012-03-29 Thread Ben McCann
Thanks Jonathan. The only reason I suggested JSON was because it already has support for lists. Native support for lists in Cassandra would more than satisfy me. Are there any existing proposals or a bug I can follow? I'm not familiar with the Cassandra codebase, so I'm not entirely sure how he

Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, I was actually going to take this up with Nate McCall a few weeks back. I think it might make sense to get the client development community together (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.) I agree whole-heartedly that it shouldn't go into the database for all the reasons

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 2:06 PM, Ben McCann wrote: > As far as I can tell, Cassandra > doesn't support maps and lists in a standardized way today, which is the > root of my problem. I'm pretty serious about adding those for 1.2, for what that's worth. (If you want to jump in and help code that up

Re: Document storage

2012-03-29 Thread Ben McCann
Jonathan, I asked Brian about his REST APIand he said he does not take the json objects and split them because the client libraries do not agree on implementations. This was exactly my concern as well with this solution.

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Yes, I meant the "row header index". What I have done is that I'm storing an object (i.e. UserProfile) where you read or write it as a whole (a user updates their user details in a single page in the UI). So I serialize that object into a binary JSON using SMILE format. I then compress it using

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian wrote: >> I think this is a much better approach because that gives you the >> ability to update or retrieve just parts of objects efficiently, >> rather than making column values just blobs with a bunch of special >> case logic to introspect them.

Re: Document storage

2012-03-29 Thread Drew Kutcharian
> I think this is a much better approach because that gives you the > ability to update or retrieve just parts of objects efficiently, > rather than making column values just blobs with a bunch of special > case logic to introspect them. Which feels like a big step backwards > to me. Unless your

Re: Document storage

2012-03-29 Thread Drew Kutcharian
I agree with Edward here, the simpler we keep the core the better. I think all the ser/deser and conversions should happen on the client side. -- Drew On Mar 29, 2012, at 8:36 AM, Edward Capriolo wrote: > The issue with these super complex types is to do anything useful with > them you would e

Re: Document storage

2012-03-29 Thread Drew Kutcharian
Hi Ben, Sure, there's nothing really to it, but I'll email it to you. As far as why I'm using Snappy on the type instead of sstable_compression is because when you set sstable_compression the compression happens on the Cassandra nodes and I see two advantages with my approach: 1. Saving extra

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
en throwing away the majority of the data I pulled back that doesn't belong to o1 and o5 -Jeremiah From: Jonathan Ellis [jbel...@gmail.com] Sent: Thursday, March 29, 2012 11:23 AM To: dev@cassandra.apache.org Subject: Re: Document storage On Thu, Mar 29,

Re: Document storage

2012-03-29 Thread Jonathan Ellis
On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan wrote: > Its not clear what 3647 actually is, there is no code attached, and no real > example in it. > > Aside from that, the reason this would be useful to me (if we could get > indexing of attributes working), is that I already have my data in

Re: Document storage

2012-03-29 Thread Edward Capriolo
The issue with these super complex types is to do anything useful with them you would either need scanners or co processors. As its stands right now complex data like json is fairly opaque to Cassandra. Getting cassandra to natively speak protobuffs or whatever flavor of the week serialization fram

Re: Document storage

2012-03-29 Thread Ben McCann
Creating materialized paths may well be a possible solution. If that were the solution the community were to agree upon then I would like it to be a standardized and well-documented best practice. I asked how to store a list of values on the user list

Re: Document storage

2012-03-29 Thread Tyler Patterson
> > > Would there be interest in adding a JsonType? What about checking that data inserted into a JsonType is valid JSON? How would you do it, and would the overhead be something we are concerned about, especially if the JSON string is large?

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
ubject: Re: Document storage Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access

RE: Document storage

2012-03-29 Thread Jeremiah Jordan
.attr1.subattr2. -Jeremiah From: Jake Luciani [jak...@gmail.com] Sent: Thursday, March 29, 2012 7:44 AM To: dev@cassandra.apache.org Subject: Re: Document storage Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only th

Re: Document storage

2012-03-29 Thread Rick Branson
Ben, You can create a "materialized path" for each field in the document: { ["user", "firstName"]: "ben", ["user", "skills", ]: "java", ["user", "skills", ]: "javascript", ["user", "skills", ]: "html", ["user", "education", "school"]: "cmu", ["user", "education", "major"]: "computer science" }

Re: Document storage

2012-03-29 Thread Ben McCann
Could you explain further how I would use CASSANDRA-3647? There's still very little documentation on composite columns and it was not clear to me whether they could be used to store document oriented data. Say for example that I had a document like: user: { firstName: 'ben', skills: ['java',

Re: Document storage

2012-03-29 Thread Jake Luciani
Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access and update portions of

Re: Document storage

2012-03-29 Thread Ben McCann
Sounds awesome Drew. Mind sharing your custom type? I just wrote a basic JSON type and did the validation the same way you did, but I don't have any SMILE support yet. It seems that if your type were committed to the Cassandra codebase then the issue you ran into of the CLI only supporting built

Re: Document storage

2012-03-28 Thread Drew Kutcharian
I'm actually doing something almost the same. I serialize my objects into byte[] using Jackson's SMILE format, then compress it using Snappy then store the byte[] in Cassandra. I actually created a simple Cassandra Type for this but I hit a wall with cassandra-cli: https://issues.apache.org/jir

Re: Document storage

2012-03-28 Thread Ben McCann
I don't imagine sort is a meaningful operation on JSON data. As long as the sorting is consistent I would think that should be sufficient. On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo wrote: > Some work I did stores JSON blobs in columns. The question on JSON > type is how to sort it. > > O

Re: Document storage

2012-03-28 Thread Edward Capriolo
Some work I did stores JSON blobs in columns. The question on JSON type is how to sort it. On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna wrote: > I don't speak for the project, but you might give it a day or two for people > to respond and/or perhaps create a jira ticket.  Seems like that's a >

Re: Document storage

2012-03-28 Thread Tatu Saloranta
On Wed, Mar 28, 2012 at 6:59 PM, Jeremiah Jordan wrote: > Sounds interesting to me.  I looked into adding protocol buffer support at > one point, and it didn't look like it would be too much work.  The tricky > part was I also wanted to add indexing support for attributes of the inserted > prot

Re: Document storage

2012-03-28 Thread Jeremiah Jordan
Sounds interesting to me. I looked into adding protocol buffer support at one point, and it didn't look like it would be too much work. The tricky part was I also wanted to add indexing support for attributes of the inserted protocol buffers. That looked a little trickier, but still not impos

Re: Document storage

2012-03-28 Thread Jeremy Hanna
I don't speak for the project, but you might give it a day or two for people to respond and/or perhaps create a jira ticket. Seems like that's a reasonable data type that would get some traction - a json type. However, what would validation look like? That's one of the main reasons there are

Re: Document storage

2012-03-28 Thread Ben McCann
Any thoughts? I'd like to submit a patch, but only if it will be accepted. Thanks, Ben On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann wrote: > Hi, > > I was wondering if it would be interesting to add some type of > document-oriented data type. > > I've found it somewhat awkward to store documen