Nope which is why I use Python with pyarrow to convert JSON to Parquet these days. Hopefully arrow / parquet-cpp supports parquet dictionaries within a couple months.
https://issues.apache.org/jira/browse/ARROW-1644 All these types of JSON structures are problematic for any Json Schema Learning engine like Drill. File ABC.json is fine, but.. [ {"address": "1600 Pennsylvania Avenue", "zip_code": "20500" }, ] File XYZ.json will bomb [ {"address": "10 Downing Street ", "zip_code": null}, ] No way to figure out what datatype zip_code is in the second file. I think Drill by default will save this as a BOOLEAN type and now you have zip code column with string and boolean values which creates chaos and will result in an exception.. The only clean way to solve these problems is to stop using schema learning and inject a schema https://json-schema.org/ into the query somehow. I just gave up trying to use Drill to work with JSON and now use Python to read json and generate parquet datasets which I can then use in Drill, etc.. -----Original Message----- From: Dweep Sharma <[email protected]> Sent: Friday, March 8, 2019 1:56 AM To: [email protected] Subject: reg: Json to Parquet External Email: Use caution with links and attachments Hi, I have a CTAS query that converts JSON to Parquet format and encounter this error sometimes (org.apache.parquet.schema.InvalidSchemaException) Cannot write a schema with an empty group: optional group address I guess this happens when drill encounters a field like "address" : {} (empty object) Is there a way to handle this ? Thanks, -Dweep -- *::DISCLAIMER:: ---------------------------------------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachments are confidential and intended for the named recipient(s) only.E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents(with or without referred errors) shall therefore not attach any liability on the originator or redBus.com. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of redBus.com. Any form of reproduction, dissemination, copying, disclosure, modification,distribution and / or publication of this message without the prior written consent of authorized representative of redbus. <https://urldefense.proofpoint.com/v2/url?u=http-3A__redbus.in_&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=Uvy2K8V8SJd_wUf26oFaOeXqIDADwHQ76HkPbQGdutw&s=Dzn4ub-codA6gMk65crCiYDZRb5MF91NA5XXlC473EI&e=>com is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.Before opening any email and/or attachments, please check them for viruses and other defects.* This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations. © 2019 BlackRock, Inc. All rights reserved.
