The other JSON format is officially JSONL.. Can we in the next version of drill in Storage Plugins by default include jsonl in extensions??
http://jsonlines.org/ From: "json": { "type": "json", "extensions": [ "json" ] }, To "json": { "type": "json", "extensions": [ "json", "jsonl" ] }, After working with both JSON and JSONL, JSONL is so much easier to work with using other tools and programming languages.. A simple linux GREP command can be used to find data, but trying to GREP a JSON file with no line breaks just returns back a wall of text.. -----Original Message----- From: Paul Rogers [mailto:[email protected]] Sent: Monday, August 27, 2018 5:47 PM To: [email protected] Subject: Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record [EXTERNAL EMAIL] Hi David, JSON files are never splittable: there is no single-character way to find the start of a JSON record within a file. Drill is supposed to support two JSON formats: the array format from the earlier post, and the non-JSON (but very common) list of objects format in this example. Thanks, - Paul On Monday, August 27, 2018, 5:38:32 PM PDT, Lee, David <[email protected]> wrote: Get rid of the opening and closing brackets and see if you can turn the commas into newlines.. The file needs to be splittable I think to reduce memory overhead vs parsing a giant string... {"var1": "foo", "var2":"bar"} {"var1": "fo", "var2": "baz"} {"var1": "f2o", "var2": "baz2"} {"var1": "f3o", "var2": "baz3"} {"var1": "f4o", "var2": "baz4"} {"var1": "f5o", "var2": "baz5"} -----Original Message----- From: scott [mailto:[email protected]] Sent: Monday, August 27, 2018 4:59 PM To: [email protected] Subject: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record [EXTERNAL EMAIL] Hi All, I'm getting an error querying some of my json files. The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record. Current token was START_ARRAY The json files are in array format, like [ { "var1": "foo", "var2": "bar"},{"var1": "fo", "var2": "baz"}] I found a ticket that indicates this format is not supported by Drill yet, DRILL-1755 <https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_DRILL-2D1755&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=G0Hsj4vSq2tBbv1c1dW6zC3pOzA_kSuhlQoFvFKpdJo&s=Dh8nYVKoOA8nQ3XdDmauSethwq9x4ric2_MsYMcfDdc&e=> , but I find it hard to believe there is no workaround or solution since this was reported 4 years back. Does anyone have a solution or workaround to this problem? Thanks, Scott This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock’s Privacy Policy. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. © 2018 BlackRock, Inc. All rights reserved.
