[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390820#comment-14390820 ] Sylvain Lebresne commented on CASSANDRA-7970: - Alright, +1, thanks. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt, 7970-trunk-v2.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382768#comment-14382768 ] Tyler Hobbs commented on CASSANDRA-7970: I've pushed some new commits to my branch to address your comments. I also merged in the latest trunk and added support for the new date and time types. bq. So, AbstractType.fromJSONObject would return a Term Done (over a few commits). The only hangup was that with collections and tuples, we need to avoid serializing elements in {{fromJSONObject}} because this can happen at prepare-time when we don't know the protocol version. Accordingly, Those classes return a DelayedValue instead of a terminal Value. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369330#comment-14369330 ] Sylvain Lebresne commented on CASSANDRA-7970: - bq. It's just to assume the system keyspace if a keyspace isn't set on the function. Ok. I still think the whole dealing with keyspace in {{FunctionName}} is fragile but it's somewhat outside this ticket so I've created CASSANDRA-8994. bq. However, I'm not clear on what you're suggesting. Can you elaborate? So, {{AbstractType.fromJSONObject}} would return a {{Term}}. For basic types, it would be a {{Constants.Value}} but for say a list, it would be a {{Lists.Value}} (containing the unserialized collection). Then {{Json.ColumnValue}} would just be a {{Term.Raw}} (not a {{Term.Terminal}}) and it's {{prepare}} would return the result of {{fromJSONObject}}. The end result being that {{Lists.Appender.doAppend}} would always get a {{Lists.Value}} and we won't need {{Lists.getElementsFromValue}}. bq. We have this problem in other parts of the code as well Right, and we should fix those some day, but that's another story :) bq. Plus, the purpose of {{ExecutionException}} is pretty clear right now: it's for errors that occur while executing a user-defined function. It's meant for errors while executing functions in general, not necessarily user-defined ones and I'm absolutely convinced we'll have to use it in native functions sooner or later. And functions in general can be used in select clauses, in which case they'll error out during execution, not during validation, so we shouldn't use IRE for them (as mentioned by Jonathan in [that comment|https://issues.apache.org/jira/browse/CASSANDRA-5910?focusedCommentId=13746044page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13746044]). Now, you could say that {{FromJsonFct}} cannot currently be used in select clauses, but 1) I'm hoping we can fix that at some point and 2) I'd really prefer consider those JSON functions as normal functions as much as possible, so I do prefer saying those errors are errors during the execution of a function rather than type errors of a sort. Lastly, a micro nits: you could make {{JSON_IDENTIFIER}} public in {{Json.java}} and use it in {{Selection}}. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350767#comment-14350767 ] Robert Stupp commented on CASSANDRA-7970: - bq. DecimalType and IntegerType quote their values IMO we should leave it as non-quoted as Sylvain suggests. With Java there shouldn't be any problem with (modern) JSON libs. Not sure about other languages' libs. The only issue I can imagine is accidentally passing a decimal for an integer (e.g. {{5.0}} instead of {{5}} or {{5E+03}} instead of {{5000}}) - json spec does only know numbers and does not distinguish between decimals and integers. That might be something we should ensure (i.e. allow a decimal value for integer values) - not sure whether the patch supports that. If it does, forgive me for not knowing ;) JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350804#comment-14350804 ] Tyler Hobbs commented on CASSANDRA-7970: The current patch will accept integers for float, double, and decimal types. (Thanks for the reminder, though: I needed to disallow floats and overflow for int and bigint types. I pushed a commit to do this.) JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350751#comment-14350751 ] Tyler Hobbs commented on CASSANDRA-7970: Okay, I've updated the [branch|https://github.com/apache/cassandra/compare/trunk...thobbs:7970-trunk-v2] to address your changes, with roughly one commit per comment to make it easier to follow. Some responses: bq. It's not entirely this patch fault but I don't understand why we need {{FunctionName.equalsNativeFunction}} It's just to assume the system keyspace if a keyspace isn't set on the function. I've modified the function to make that more correct and a little clearer. Let me know if that makes sense to you. bq. If we made fromJSONObject return a Term, we'd avoid serializing collection to re-deserialize them right away. I too would like to avoid the unnecessary serializations and deserializations. We have this problem in other parts of the code as well (IN value lists, multi-column relations, and collections in column conditions, iirc). However, I'm not clear on what you're suggesting. Can you elaborate? bq. {{FromJsonFct}} should probably use the new {{ExecutionException}} I'm not sure about this. Errors in {{FromJsonFct}} are type errors of a sort (at least if you consider JSON's native type formatting) or incorrect CQL literals (which also result in an IRE). I think sticking with an IRE for type errors with native functions would be more consistent from a user's point of view. Plus, the purpose of {{ExecutionException}} is pretty clear right now: it's for errors that occur while executing a user-defined function. We would lose that if we use it here. bq. For INSERT JSON, I think having a ParsedInsertJson separate from ParsedInsert makes things a bit cleaner. \[...\] How do you feel about it? I definitely agree that the separate Insert classes are an improvement. The Json Term refactoring is less of a win, because it still has some odd parts (classes that are both Raw and Terminal), but the old version had odd parts too. I'm fine with the new setup, so I made those changes with a few things renamed for clarity (e.g. SimplePrepared - PreparedLiteral). bq. Any particular reason for having DecimalType and IntegerType quote their values? Well, I had some concerns around loss of precision and overflows if the client-side JSON decoder attempted to store them as, say, Java doubles and longs. However, I think it would be reasonable to say that the client should ensure their JSON decoder handles these issues. So for now, I've switched them to non-string representations. Let me know if you thing the client-side decoding issues warrant keeping them as strings. bq. Out of curiosity, was there a profound reason to make Term.Terminal take the protocol version only? This was part of what I was attempting to address with the CollectionSerializer.Format code (now reverted). Some parts of the code need to specify a format/protocol version without any QueryOptions present (or it doesn't make sense to use QueryOptions). It's a bit weird to pass around a protocol version when we're not actually dealing with the native protocol, which is why I looked at switching to a format enum. I understood why we were using QueryOptions to begin with, but this seemed like the least invasive change. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”:
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333620#comment-14333620 ] Sylvain Lebresne commented on CASSANDRA-7970: - bq. I attempted to improve the handling of collection serialization formats \[...\]. I have mixed feelings about the results. Me too :). I do think some clarification around this can be done, but I'm not entirely sure having a special version for collection serialization is necessarily the best way to go. Overall, I'd prefer leaving such refactor out of this ticket. Other remarks on the patch: * For the textile doc, I'd split out a new JSON section and only link to that from the {{INSERT}} and {{SELECT}} doc. Documenting JSON in the middle of the statements feels slightly confusing, it would feel clearer to me to concentrate it in a separate section, one that also start by stating what JSON support is all about (i.e. really just a convenience for getting data from/to a JSON form). Also, the doc is fully up to date with the patch (mention that {{VALUES}} for {{INSERT}} is optional when using JSON for instance) - {{cql3handling.py}} is not up to date on the {{INSERT}} syntax. * When doing {{INSERT INTO foo JSON ?}}, I'd maybe use {{\[json\]}} as receiver name to be consistent with what we do for {{LIMIT}}, {{TIMESTAMP}} and {{TTL}}. Also, UpdateStatement.ParsedInsert.jsonReceiverSpec is unused. Same for {{SELECT}} (for consistency with {{\[applied\]}}). * Should make {{FromJsonFct/toJsonFct}} extends {{NativeScalarFunction}} directly. * It's not entirely this patch fault but I don't understand why we need {{FunctionName.equalsNativeFunction}} (why not just use equals)? The only difference is that if {{this}} doesn't have a keyspace, it doesn't check the other name doesn't either, but that just feels wrong to me tbh. * For UDT, I think we need to care about case sensitivity in {{fromJSONObject}}. * In TupleType/UDTType.fromJSONObject, we should force the format to V3. It's also a tad weird to use CollectionSerializer.writeValue to write the UDT. Why not use the TupleValue.buildValue() function? Or my next suggestion. * If we made {{fromJSONObject}} return a {{Term}}, we'd avoid serializing collection to re-deserialize them right away. We'd also leave the serialization of UDT/Tuple to UserTypes/TupleTypes.DelayedValues (doesn't hurt to concentrate it in one place as much as possible) and save a bunch of changes (in ColumnCondition, Lists, Sets and Maps). * I'd prefer limiting the json pollution in {{Selection}}. We (mostly) really need to bother about JSON when writing the ResultSet, so we could simply pass the isJson to the {{resultSetBuilder}} method (somehow it makes more sense to me to call {{rowToJson}} in the builder anyway). The one other place we need it is {{getResultMetadata}}, but we could also pass a {{isJson}} flag there and either build the metadata if it's json (not a big deal) or find a simple way to cache the json metadata per-table (since it's always the same thing except for the keyspace and table name). * {{FromJsonFct}} should probably use the new {{ExecutionException}} from CASSANDRA-8528 rather than {{InvalidRequestException}}. * For {{INSERT JSON}}, I think having a ParsedInsertJson separate from ParsedInsert makes things a bit cleaner. I also feel that Json.java could encapsulate a bit better the necessary specificity of {{INSERT JSON}}. As I'm not sure how to express clearly what I mean (and because I wasn't sure that what I had in mind would really work), I've push a commit at [here|https://github.com/pcmanus/cassandra/commits/7970] shows what I have in mind. It's not really different from the initial approach but it feels cleaner overall. How do you feel about it? * Any particular reason for having {{DecimalType}} and {{IntegerType}} quote their values (they use the default {{toJSONString}}). I would have expected them to translate to JSON numbers since as far as I can tell JSON doesn't impose any size limit on its numbers. * Maybe we could abstract slighty our use of jackson (put the helpers we need in Json.java maybe?), so that 1) we have only one place to change if we upgrade jackson and the API change (or we want to change of library) and 2) we save creating multiple ObjectMapper or JsonStringEncoder objects. Also two minor nits: * The patch makes ResultSet.COUNT_COLUMN public for no reason it seems. * Out of curiosity, was there a profound reason to make Term.Terminal take the protocol version only (instead of the full QueryOptions)? If it's just because it's the only thing used, that's fine, but just fyi passing QueryOptions was done so we don't have to change everything again if a future Term needs something else than the protocol version. JSON support for CQL Key: CASSANDRA-7970 URL:
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328097#comment-14328097 ] Tyler Hobbs commented on CASSANDRA-7970: bq. I'd actually have a preference for not allowing the column name declaration at all as it doesn't buy us anything There is a minor difference in behavior. If the JSON value map does not contain an entry for a column listed in the column names, null will be inserted for that column. Omitting the column names is currently equivalent to listing them all. An example of the current behavior may clarify this: {code:sql} CREATE TABLE foo (a int, b int, c int, d int, PRIMARY KEY (a, b)); INSERT INTO foo JSON '{a: 0, b: 0, c: 0, d: 0}'; SELECT * FROM foo; a | b | c | d ---+---+---+--- 0 | 0 | 0 | 0 INSERT INTO foo JSON '{a: 0, b: 0, c: 0}'; SELECT * FROM foo; a | b | c | d ---+---+---+-- 0 | 0 | 0 | null {code} Since column {{d}} didn't have a value in the json map, null was inserted. However, if we explicitly list the column names (and leave out {{d}}), we can avoid inserting null: {code:sql} INSERT INTO foo JSON '{a: 0, b: 0, c: 0, d: 0}'; INSERT INTO foo (a, b, c) JSON '{a: 0, b: 0, c: 0}'; SELECT * FROM foo; a | b | c | d ---+---+---+--- 0 | 0 | 0 | 0 {code} bq. Even if some of our literals can be of multiple types (typically numeric literals), they will always translate to the same thing in JSON anyway so that shouldn't be a problem. As for bind markers, we can do what we do for other functions when their is an ambiguity and require the user to provide a type-cast. Is it just that it's not convenient to do with the current code, or is there something more fundamental I'm missing? Hmm, I think what you suggest should be possible, although it will probably require a fair amount of code changes, especially for handling literals (which would probably bypass the whole AbstractType system). Let me play around with this and get back to you. Would you like to get support for this and the {{fromJson()}} changes in this patch, or would you be willing to push this to another ticket? JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328152#comment-14328152 ] Sylvain Lebresne commented on CASSANDRA-7970: - bq. There is a minor difference in behavior. I see. But the thing is that we don't really have a way to implement such difference except at the root level. Meaning that if you have: {noformat} INSERT INTO foo JSON '{a: 0, b : { c : 1, d : 2}}' {noformat} where {{b}} is a non-frozen UDT (which we don't have yet but will eventually) with more than columns {{c}} and {{d}}, we'll have to choose the behavior once and for all (either it replace {{b}} completely, setting it's other column to null, which is what I think it should do, or it let those other column untouched). Hence I'm not entirely sure having a special case for the root level is all that consistent/useful. Further, while C* doesn't strongly enforce it, we've tried to make it clear that {{INSERT}} should be used for actual inserts and you should use {{UPDATE}} for updates. While if you care about the difference between those two variants, it feels to me that you're using an {{INSERT}} where you should be using an {{UPDATE}}. So I still would feel more confortable removing the 'with column name' variant from the initial version, we can always revisit later if it feels truly useful. bq. would you be willing to push this to another ticket? That. My suggestion is indeed a bit involved and is to some extend orthogonal to this issue (it's more about improving the type system). Let's indeed stick to simple rejection when we don't know how to handle it for now. We can scratch my itch later :) JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328357#comment-14328357 ] Tyler Hobbs commented on CASSANDRA-7970: bq. So I still would feel more confortable removing the 'with column name' variant from the initial version, we can always revisit later if it feels truly useful. That's reasonable. I've updated my branch to remove support for column names + JSON. bq. Let's indeed stick to simple rejection when we don't know how to handle it for now. Okay, I've opened CASSANDRA-8837 for a more complete solution. In the meantime, I pushed a (slightly hacky) commit to make {{fromJson()}} results always weakly assignable, forcing the user to typecast when it's used as the argument of an overloaded function. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322549#comment-14322549 ] Sylvain Lebresne commented on CASSANDRA-7970: - Sorry for the lack of communication. I'll have a good look at this but that might only be next week if that's ok (got a long flight next Sunday so I'll review it then). That said a couple of early remarks (that might be somewhat off since I haven't checked the patch) based on the comments on this ticket so far. bq. I've made the column name declaration optional with doing INSERT JSON I'd actually have a preference for not allowing the column name declaration at all as it doesn't buy us anything and imo having 2 forms is more confusing than anything. Even if we later want to allow both {{VALUES}} and {{JSON}} (which I'm actually kind of against but we can argue later since we've at least agreed on postoning that option), we can introduce back the names declaration later. bq. toJson() can only be used in the selection clause of a SELECT statement, because it can accept any type and the exact argument type must be known. Not 100% sure I see where the problem is on this one, at least in theory. Even if some of our literals can be of multiple types (typically numeric literals), they will always translate to the same thing in JSON anyway so that shouldn't be a problem. As for bind markers, we can do what we do for other functions when their is an ambiguity and require the user to provide a type-cast. Is it just that it's not convenient to do with the current code, or is there something more fundamental I'm missing? bq. fromJson() can only be used in INSERT/UPDATE/DELETE statements because the receiving type must be known in order to parse the JSON correctly. That one I understand, but I'm not sure a per-statement restriction is necessary the most appropriate because I suppose there is a problem with functions too since we allow overloading (namely, we can have 2 {{foo}} method, one taking a {{listtext}} as argument, and the other taking a {{int}}, so {{foo(fromJson(z))}} would be problematic). So the most logical way to handle this for me would be to generalize slightly the notion of some type that we already have due to bind marker. Typically, both a bind marker type and {{fromJson}} return type would be some type, and when the type checker encounter one and can't resolve it to a single type, it would reject it asking the user to type-cast explicitely. Similarly, {{toJon()}} argument could be some type. Again, we already do this for bind markers, it's a just a bit adhoc so it would just be a matter of generalizing it a bit. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3, docs-impacting Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321365#comment-14321365 ] Robert Stupp commented on CASSANDRA-7970: - Latest contents on your branch LGTM /cc [~slebresne] JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3 Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319732#comment-14319732 ] Robert Stupp commented on CASSANDRA-7970: - bq. JsonTest.testCaseSensitivity() An INSERT with an upper-case 'K' would be fine IMO Hm - and the CQL doc is missing ({{doc/cql3/CQL.textile}}). I've started using the label {{cql3.3}} for tickets that adds changes to CQL syntax. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3 Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320441#comment-14320441 ] Tyler Hobbs commented on CASSANDRA-7970: Okay, I've added a test case and updated the CQL syntax docs on my branch. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Labels: client-impacting, cql3.3 Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316308#comment-14316308 ] Robert Stupp commented on CASSANDRA-7970: - Beside some last points from my side. I’ve compared your branch against it’s origin in trunk, since it does not merge without conflicts against current trunk - but that’s fine for a ”quick look”. * {{ColumnCondition.CollectionBound#valueAppliesTo}}, {{Lists.getElementsFromValue}}, {{Sets.getElementsFromValue}} and {{Maps.Putter}} do similar things. Any chance to consolidate everything/most into a single function/few functions? * in {{Json#handleCaseSensitivity}} you can remove the two ArrayLists and the Map.put/remove sequences when you iterate iterate over {{for (String mapKey : new HashSet(valueMap.keySet()))}} * not sure, if you already have, but a unit test to test all combinations in {{Json#handleCaseSensitivity}} would be great * Can you remove {{Selection#validateSelectors}}? It’s unused. * Can you extract a method for the two switch-statements at the end of {{ParsedInsert#prepareInternal}}? * In {{SetType.fromJSONObject}} you could add a simple {{if (!buffers.add(…)) throw SomeMeaningfulDuplicateElementException}} (and remove the size-check after the loop). * in {{UserType.fromJSONObject}} you can safely replace {{buffers}} with an ordinary array Regarding the JSON library: I’ve created CASSANDRA-8785 (software not maintained for many years does feel right). JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316339#comment-14316339 ] Robert Stupp commented on CASSANDRA-7970: - Heh - CASSANDRA-8785 is no longer a problem. The JSON library used in C* (json-simple 1.1) is quite aged: 1.1 is dated 08/2009 and the newest version 1.1.1 is dated 03/2012. That library uses very old Java APIs (e.g. StringBuffer). On [this page|http://eclipsesource.com/blogs/2013/04/18/minimal-json-parser-for-java/] is some JSON parser performance comparison. IMO Jackson feels like a nice alternative to json-simple for this one (maybe a follow-up ticket) so we could completely get rid of that json-simple. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316950#comment-14316950 ] Tyler Hobbs commented on CASSANDRA-7970: The branch has been updated to address your comments and merge in the latest trunk. bq. not sure, if you already have, but a unit test to test all combinations in Json#handleCaseSensitivity would be great There is a unit test: {{JsonTest.testCaseSensitivity()}}. I believe that covers all combinations; let me know if I've missed something. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304148#comment-14304148 ] Tyler Hobbs commented on CASSANDRA-7970: bq. But I can imagine situations where some values are collected via JSON and enriched by the application code \[...\] That's a fair point. If users ask for it, I would probably be okay adding it, but let's take the wait and see approach for now. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304159#comment-14304159 ] Robert Stupp commented on CASSANDRA-7970: - bq. both allows VALUES and JSON ... let's take the wait and see approach for now. Agree JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304029#comment-14304029 ] Tyler Hobbs commented on CASSANDRA-7970: Thanks for the comments, [~snazy]. I've updated the branch with one commit per comment. A few responses: bq. Can you add a INSERT JSON variant that tackles tuples and types? I assume you mean a unit test. If so, that's done. bq. Was wondering if we really need to mention the column names since they are contained in the JSON Good point, I've made the column name declaration optional with doing INSERT JSON. bq. The Java Driver tries to contact a coordinator that owns the target partition \[...\] We currently have the same problem if any functions are used for the insert/update. I don't think it's too big of a deal, and I'm not sure that we have any good solutions right now. bq. Maybe we can discuss a follow-up that both allows {{VALUES}} and {{JSON}} I would personally be -1 on that. I don't think it's worth making the query language more complicated for a networking-related optimization. Plus, if folks are splitting up JSON to do that anyway, they might as well just use VALUES all the way. bq. would be nicer (and eliminate one String instance) to replace sb.append(JSONValue.escape(columnName)) with something like JSONValue.escape(columnName, sb) I would like to do that as well, but JSONValue doesn't support anything like that. (It's from the JSON.simple library, not our code.) We could always implement our own escape method, but I'm not sure that's worth it. bq. ReversedType : not sure whether the impl is correct - doesn’t it need to reverse the binary representation? It doesn't. The only thing ReversedType changes is the compare() method. See fromString(), for example. bq. FromJsonFct + ToJsonFct : would be nicer to have a CHM instead of synchronized HashMap. Done. We should probably also do this for the CollectionType subclasses. bq. Also these maps in these classes have AbstractType as the key and are never evicted - means: a changed user type would remain forever in these maps and if a UDT Hmm, that's a good point. It's a pretty minor concern, so I would be in favor of a follow-up ticket (especially considering possible AbstractType changes). JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304072#comment-14304072 ] Robert Stupp commented on CASSANDRA-7970: - Heh - yes, I meant a unit test. bq. follow-up that both allows VALUES and JSON Understand your point. Sure - it would be too much syntactic foo just for a partition key. But I can imagine situations where some values are collected via JSON and enriched by the application code. Maybe some IoT data from sensors that can be passed to the table as is in JSON plus some enrichment done by the application (e.g. adding location of the sensors). For example sensors that emit {{wind_direction}} and {{wind_speed}} in JSON format - the collecting application could enrich it with {{sample_timestamp}} and {{gps_coordinates}} using conventional {{VALUES}} syntax. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300120#comment-14300120 ] Robert Stupp commented on CASSANDRA-7970: - I took a short look at the patch and played around a little bit with this. Some comments: * A test with null values in {{testFromJsonFct()}} and {{testToJsonFct()}} would be nice (toJson handles null correctly, fromJson throws an NPE for {{INSERT INTO tojson (k, asciival ) VALUES ( 0, fromJson(null) );}}) * Can you add a {{INSERT JSON}} variant that tackles tuples and types? * I tried {{insert into some_table (k, asciival ) JSON \{k: 0, asciival: foobar\};}} in cqlsh - it complains with a syntax error (obviously my bad). Was wondering if we really need to mention the column names since they are contained in the JSON. Couldn't a {{insert into some_table JSON ?;}} with json {{\{k: 0, asciival: foobar\}}} also work? * The Java Driver tries to contact a coordinator that owns the target partition. This gets complicated with {{INSERT JSON}} since the driver would have to parse the JSON before it actually knows the correct node. Maybe we can discuss a follow-up that both allows {{VALUES}} and {{JSON}} - e.g. {{INSERT INTO some_table (partitionKey, clusteringKey) VALUES (1, 2) JSON ?}}, where the {{JSON ?}} part contains more columns. * missing license header in {{Json.java}} * In {{FunctionCall}} you exchanged „fun“ with „fun.name()“ eliminating the function signature in the exception message * {{Selection.rowToJson}} : use Collections.singleton instead of {{Arrays.asList}} * {{Selection.rowToJson}} : would be nicer (and eliminate one String instance) to replace sb.append(JSONValue.escape(columnName)) with something like JSONValue.escape(columnName, sb); (make escape write to sb) * Similar for {{AbstractType.toJSONString}} - let the implementation write to the string builder * {{ReversedType}} : not sure whether the impl is correct - doesn’t it need to reverse the binary representation? * {{FromJsonFct}} + {{ToJsonFct}} : would be nicer to have a CHM instead of synchronized HashMap. Also these maps in these classes have {{AbstractType}} as the key and are never evicted - means: a changed user type would remain forever in these maps and if a UDT. This might get irrelevant when {{AbstractType}} is removed - so not sure, whether we should fix this now - maybe open a follow-up-ticket for this? A simple fix would be to 'statically cache' function instances for primitive types but always create a new instance of these functions for tuples + UDTs ; then you could pre-create the function instances completely eliminating the need for CHM or synchronized. Altogether: really nice work :) JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 Attachments: 7970-trunk-v1.txt JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284484#comment-14284484 ] Tyler Hobbs commented on CASSANDRA-7970: [~lhartzman] that's a fair point. We had similar debates about supporting other (Cassandra) types that JSON doesn't support natively, like UUIDs. The decision was that if we didn't support those types, users would end up using strings instead. The downside to this is a lack of type safety, validation, and correct sorting. Additionally, functions become incompatible and drivers (including cqlsh) no longer have proper type information. I think the same argument can be applied here. Users will want to use maps keyed by ints, uuids, frozen sets, etc. If we don't support that, they'll simply declare the keys to be strings instead, and the same problems around losing type information will come up. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267409#comment-14267409 ] Sylvain Lebresne commented on CASSANDRA-7970: - I think it might be worth clarifying what we're talking about here because it appears I don't have the same understanding as some of you guys. I'm not at all in favor of making CQL weakly typed. We've remove the accept string for everything in CQL early on and I'm still convinced it was a good thing. My understanding of Tyler's suggestion is that it's only related to JSON support. It's to say that if we have the table: {noformat} CREATE TABLE foo ( c1 int PRIMARY KEY, c2 float, c3 mapint, timestamp, ) {noformat} then we'll agree to map to it a json literal like {noformat} { 'c1' : '3', 'c2' : '4.2', 'c3' : { '4', '2011-02-03 04:05'} } {noformat} and this even though the JSON uses strings everywhere. And I think it's ok to do this because JSON is not really a typed thing in the first place: it has different kind of literals, but it's not typed per se, and it doesn't support all the literals that CQL supports anyway (typically uuid literals, which is why we will have to at least accept string for uuids). But I think it's ok to do this for the JSON mapping (which again, I just see as considering JSON as untyped/weakly typed) without going as far as making CQL itself weakly typed. But if we disagree on that part, and you guys think it would be horribly inconsistent to accept that kind of thing in the JSON translation without weakening the CQL typing, then count me as a *strong* -1 on the whole thing. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267412#comment-14267412 ] Sylvain Lebresne commented on CASSANDRA-7970: - Or to clarify, if Tyler's everywhere means everywhere in CQL, then I'm a strong -1. I had understood it as systematically in the JSON translation, which I'm fine with. If I misunderstood, my bad. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267422#comment-14267422 ] Robert Stupp commented on CASSANDRA-7970: - -1 on accepting strings for anything _outside_ of JSON *map keys* (object names). So don't allow it on JSON object values - just on [JSON object names|http://www.json.org/] {noformat} { 'c1' : 3, 'c2' : 4.2, 'c3' : { '4', '2011-02-03 04:05'} } {noformat} JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267611#comment-14267611 ] Aleksey Yeschenko commented on CASSANDRA-7970: -- I'm sure Tyler's 'everywhere' meant 'everywhere in JSON', in which case I'm okay with it - we already have to accept it b/c a JSON key can only be a string, and we'll need to do type-casting there anyway. The alternative would be to limit tojson/fromjson to the subset of schemas w/ types strictly matching to JSON literals' types - when keys are strings, and values are strings/numbers/booleans/lists/UDTs. I will not fight for that option anymore, though. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268341#comment-14268341 ] Robert Stupp commented on CASSANDRA-7970: - [~thobbs] I was a bit confused about the discussion because I thought there were plans to allow string encoded numbers everywhere in CQL. Guess, now I got it :) TLDR - I'm okay with accepting numbers in strings in JSON: I think that we should only use string representation where the JSON spec gives us no other option (that's on 'object names' / 'map keys'). OK, it's a different thing whether we _accept_ numbers in strings or whether we _produce_. IMO we should produce numbers (and not numbers as strings) for JSON object values. Unfortunately I've no better argument than follow the spec. But OTOH it does not matter whether number parsing in antlr fails or whether a Float.parseFloat fails and JSON spec does not force us to not encode numbers in strings. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267969#comment-14267969 ] Tyler Hobbs commented on CASSANDRA-7970: Yes, by everywhere, I meant everywhere in JSON. It sounds like everybody is roughly in agreement with accepting string representations of ints and floats anywhere in a JSON string (which I think is reasonable, given that we accept string representations of other types within JSON anyway). The only object is [~snazy]. Can you expand on why you think it's a bad idea? JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267990#comment-14267990 ] Les Hartzman commented on CASSANDRA-7970: - Reading through the proposals, I can understand the idea of providing some flexibility. But I think that it would be wrong to allow for numeric values to be implicitly converted to strings. It seems that this came up from the idea of converting a mapint, text structure into JSON. But what was the purpose of this enhancement to begin with? It seemed to me that the purpose of this was to allow for the storage and retrieval of JSON in a columnar format as opposed to a CLOB. If the focus remains solely on the reading and writing of 'normal' JSON, there is no issue. I think that this issue has expanded beyond what was intended: it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266973#comment-14266973 ] Aleksey Yeschenko commented on CASSANDRA-7970: -- Also, sqlite. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266964#comment-14266964 ] Jonathan Ellis commented on CASSANDRA-7970: --- I could go either way. On the one hand, my experience with weak typing (js, perl) leads me to think it is a misfeature, so we should limit its scope to the json context where the limited spec forces us to do so. But I'm not sure if that objection really applies to a non-procedural language like SQL. Are there other weakly typed SQL dialects for comparison? JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267037#comment-14267037 ] Jonathan Ellis commented on CASSANDRA-7970: --- I'm a firm -1 on going as far as sqlite: {code} sqlite select 'foo' + 0; 0 {code} But only accepting strings if they are reasonable number-ish representations seems at least *more* reasonable. (I *think* this is the mysql approach.) JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267160#comment-14267160 ] Jeremiah Jordan commented on CASSANDRA-7970: The idea of allowing this in a prepared statement seems very wrong to me, as long as we are thinking this only happens for unprepared statements, and we don't do best guess at the formatting. Abc 123 and 123 abc both get rejected out right, I guess I can see it being useful. Though I really think doing the str to int client side is the right way to go. I can't really see why you need it outside of json. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264342#comment-14264342 ] Sylvain Lebresne commented on CASSANDRA-7970: - bq. are you okay with accepting strings in place of ints/floats/etc everywhere (for the sake of consistency)? I'm no Jonathan, but it would make sense to me to do that. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262451#comment-14262451 ] Tyler Hobbs commented on CASSANDRA-7970: {quote} I think we should apply Postel's principle here. If we have a mapint, text and someone gives us {'1': 'foo'} we should turn it into an integer rather than rejecting it. Especially since, as you point out, there's really no other option absent transform functions that we're not implementing initially. {quote} [~jbellis] to check, if we do that, are you okay with accepting strings in place of ints/floats/etc everywhere (for the sake of consistency)? JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: New Feature Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200974#comment-14200974 ] Tyler Hobbs commented on CASSANDRA-7970: There's one problem that I forgot about: the JSON spec only accepts strings for map keys. I think that leaves us with two options: # Accept a string version of anything (including ints, floats, etc). For example, if we're expecting a float when decoding JSON and the JSON decoder returns a String, we should try to create a float from it. For {{toJson()}}, we would convert non-strings to strings when they're map keys. # Don't allow {{toJson()}} and {{fromJson()}} to be used on maps with non-string keys. If we have transform functions at some point, those could be used to support these kinds of maps. Option 2 would be my preference. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200982#comment-14200982 ] Aleksey Yeschenko commented on CASSANDRA-7970: -- bq. Option 2 would be my preference. Likewise. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200981#comment-14200981 ] Joshua McKenzie commented on CASSANDRA-7970: Option 2 seems consistent from a JSON ecosystem perspective as well. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200992#comment-14200992 ] Jonathan Ellis commented on CASSANDRA-7970: --- I think we should apply Postel's principle here. If we have a mapint, text and someone gives us {'1': 'foo'} we should turn it into an integer rather than rejecting it. Especially since, as you point out, there's really no other option absent transform functions that we're not implementing initially. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166841#comment-14166841 ] Aleksey Yeschenko commented on CASSANDRA-7970: -- +1 as well JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165276#comment-14165276 ] Sylvain Lebresne commented on CASSANDRA-7970: - I'll note that there is imo 2 slightly separate case to handle. Not only do we want to translate full JSON objects to CQL rows and vice-versa as in the example of the description. But we should also allow to convert CQL values to JSON and vice-versa, because well, the row to/from json will really just be a little extension of the cql value to/from json. Concretely, we'd want something along the lines of: {noformat} UPDATE users SET addresses = addresses + fromJson($${work : { ... }}$$) WHERE id = ... {noformat} and I don't see why we wouldn't allow stuffs like: {noformat} SELECT id, toJson(addresses) FROM users; {noformat} since again, we will have to implement such functions internally. For those cases, I do think the functional syntax work better than a specific syntax because well, those are really plain old functions. That said, I'd be personally be fine with adding toJson/fromJson functions for the conversion of CQL value to and from json, but having specific syntax like the one in the description to apply the conversion to full rows (I do agree that the functional syntax is somewhat inconsistent with normal functions when it comes to applying it to row). JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166189#comment-14166189 ] Jonathan Ellis commented on CASSANDRA-7970: --- bq. I'd be personally be fine with adding toJson/fromJson functions for the conversion of CQL value to and from json, but having specific syntax like the one in the description to apply the conversion to full rows +1 JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157097#comment-14157097 ] Tyler Hobbs commented on CASSANDRA-7970: We had some discussion about implementing this as a function instead of a syntax extension, I would like to document my thoughts on the two choices. h3. Syntax Approach Inserts: {{INSERT INTO users (id, name, address) JSON $$\{'id': ...\}$$;}} Selects: {{SELECT JSON id, address FROM users;}} or {{SELECT id, address FROM users WITH JSON FORMATTING;}} Possible future enhancements such as default values and transformation functions can be handled with syntax: * {{INSERT INTO users JSON ? WITH JSON DEFAULTS = \{field: default\}}} * {{INSERT INTO users JSON ? WITH JSON TRANSFORMS = \{field: fn\}}} Problems: * With SELECT, the number and type of selected columns changes, which is somewhat quirky. * It's not possible to select both normal columns and a JSON document that contains some of the columns. * Functions cannot be applied to the JSON document. (For example, {{concat()}}, if we consider aggregation functions.) h3. Functional Approach Inserting: {{INSERT INTO users (id, name, address) VALUES jsonToRow($${'id': ...$$);}} Selecting: {{SELECT rowToJson(id, address) FROM users;}} Possible future enhancements such as default values and transformation functions can be handled with arguments: {{INSERT INTO users VALUES jsonToRow(..., \{field: default\}, \{field: fn\})}} Problems: * When selecting, field names could get strange with function calls and subfields (e.g. {{SELECT rowToJson(k, sin(v)) ...}}). We might eventually need to offer a way to change field names. This could be handled with another optional argument or perhaps {{AS}} syntax inside the function call. Although I think the syntax option is somewhat cleaner, the function approach is more powerful, doesn't require expanding the language, and is probably a bit more consistent with the rest of CQL. For those reasons, I think we should go with json functions. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157218#comment-14157218 ] Jonathan Ellis commented on CASSANDRA-7970: --- bq. With SELECT, the number and type of selected columns changes, which is somewhat quirky. I guess I don't see how this is more quirky than number and type of selected columns changing in a non-json select. bq. It's not possible to select both normal columns and a JSON document that contains some of the columns. True, but that seems like a reasonable constraint for the design goals here. bq. Functions cannot be applied to the JSON document. (For example, concat(), if we consider aggregation functions.) Also a reasonable constraint for the design goals. Here's my problem with the row_to_json approach: it inherently requires special casing the function machinery, which is Bad. What I mean by this is, all other function calls should have the same behavior whether called on a literal or a column with the same value. Thus, sin(2) and sin(a) where a=2 are semantically equivalent. But row_to_json(a) has to cheat and generate {code}{'a': 2}{code}. (Unless you are saying that row_to_json(a) would generate just the literal 2, in which case I have to say that having to spell out all the json fields is insufficiently usable to be a viable candidate.) Postgresql gets around this with subselects and the concept of a row constructor: http://hashrocket.com/blog/posts/faster-json-generation-with-postgresql I'm okay with the postgresql approach in a lot of respects, but it introduces a ton of complexity that I don't really want to block for. Nor am I sure that we want to open up the subquery can of worms for this, which will inevitably lead to people pushing for them to be generalized. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157224#comment-14157224 ] Jonathan Ellis commented on CASSANDRA-7970: --- One nit on the special syntax: since it's already special, we wouldn't need to require string delimiters on the json literal. Could just be implicit. JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7970) JSON support for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157292#comment-14157292 ] Tyler Hobbs commented on CASSANDRA-7970: bq. I guess I don't see how this is more quirky than number and type of selected columns changing in a non-json select. We don't normally do that in a non-json select, though. (The one exception I can think of is selecting duplicate columns, like {{SELECT a, b, a FROM}}.) With {{SELECT JSON a, b, c}}, you get back one column. I'm not saying this is terrible behavior, but it's definitely an exception to the way things normally work. bq. What I mean by this is, all other function calls should have the same behavior whether called on a literal or a column with the same value. Thus, sin(2) and sin(a) where a=2 are semantically equivalent. But row_to_json(a) has to cheat and generate... Currently we don't allow selecting constants or function calls, so technically that's not an issue yet, but I guess it's only a matter of time until we do support that. {{rowToJson()}} would use the natural identifiers of selections as field names for where possible (columns, subfields, etc) and use the constant or function call as the field name otherwise. So for example, {{SELECT rowToJson(2, now(), a) FROM ...}} would return \{'2': 2, 'now()': 1234567, 'a': 'foo'\}. This could be overridden with additional arguments or {{AS}} syntax. bq. (Unless you are saying that row_to_json(a) would generate just the literal 2, in which case I have to say that having to spell out all the json fields is insufficiently usable to be a viable candidate.) Nope, agreed. bq. Postgresql gets around this with subselects and the concept of a row constructor I agree that subselects are not something we want to introduce. Row constructors might internally simplify some of the typing issues, but from an API perspective they're roughly equivalent to my proposal (unless I'm missing some advantage). JSON support for CQL Key: CASSANDRA-7970 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970 Project: Cassandra Issue Type: Bug Components: API Reporter: Jonathan Ellis Assignee: Tyler Hobbs Fix For: 3.0 JSON is popular enough that not supporting it is becoming a competitive weakness. We can add JSON support in a way that is compatible with our performance goals by *mapping* JSON to an existing schema: one JSON documents maps to one CQL row. Thus, it is NOT a goal to support schemaless documents, which is a misfeature [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON document from a service or a user into a CQL row, with all the validation that entails. Since we are not looking to support schemaless documents, we will not be adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will map the JSON to UDT, collections, and primitive CQL types. Here's how this might look: {code} CREATE TYPE address ( street text, city text, zip_code int, phones settext ); CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses maptext, address ); INSERT INTO users JSON {‘id’: 4b856557-7153, ‘name’: ‘jbellis’, ‘address’: {“home”: {“street”: “123 Cassandra Dr”, “city”: “Austin”, “zip_code”: 78747, “phones”: [2101234567]}}}; SELECT JSON id, address FROM users; {code} (We would also want to_json and from_json functions to allow mapping a single column's worth of data. These would not require extra syntax.) [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/ [2] https://blog.compose.io/schema-less-is-usually-a-lie/ [3] http://dl.acm.org/citation.cfm?id=2481247 -- This message was sent by Atlassian JIRA (v6.3.4#6332)