Nope which is why I use Python with pyarrow to convert JSON to Parquet these 
days. Hopefully arrow / parquet-cpp supports parquet dictionaries within a 
couple months.

https://issues.apache.org/jira/browse/ARROW-1644

All these types of JSON structures are problematic for any Json Schema Learning 
engine like Drill.

File ABC.json is fine, but..
[
{"address": "1600 Pennsylvania Avenue", "zip_code": "20500" },
]

File XYZ.json will bomb
[
{"address": "10 Downing Street ", "zip_code": null},
]

No way to figure out what datatype zip_code is in the second file. I think 
Drill by default will save this as a BOOLEAN type and now you have zip code 
column with string and boolean values which creates chaos and will result in an 
exception..

The only clean way to solve these problems is to stop using schema learning and 
inject a schema https://json-schema.org/ into the query somehow.

I just gave up trying to use Drill to work with JSON and now use Python to read 
json and generate parquet datasets which I can then use in Drill, etc..


-----Original Message-----
From: Dweep Sharma <[email protected]> 
Sent: Friday, March 8, 2019 1:56 AM
To: [email protected]
Subject: reg: Json to Parquet

External Email: Use caution with links and attachments


Hi,

I have a CTAS query that converts JSON to Parquet format and encounter this 
error sometimes

 (org.apache.parquet.schema.InvalidSchemaException) Cannot write a schema with 
an empty group: optional group address

I guess this happens when drill encounters a field like "address" : {} (empty 
object)

Is there a way to handle this ?

Thanks,
-Dweep

--
*::DISCLAIMER::

----------------------------------------------------------------------------------------------------------------------------------------------------


The contents of this e-mail and any attachments are confidential and intended 
for the named recipient(s) only.E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted,lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents(with or without referred errors) shall therefore 
not attach any liability on the originator or redBus.com. Views or opinions, if 
any, presented in this email are solely those of the author and may not 
necessarily reflect the views or opinions of redBus.com. Any form of 
reproduction, dissemination, copying, disclosure, modification,distribution and 
/ or publication of this message without the prior written consent of 
authorized representative of redbus.
<https://urldefense.proofpoint.com/v2/url?u=http-3A__redbus.in_&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=Uvy2K8V8SJd_wUf26oFaOeXqIDADwHQ76HkPbQGdutw&s=Dzn4ub-codA6gMk65crCiYDZRb5MF91NA5XXlC473EI&e=>com
 is strictly prohibited. If you have received this email in error please delete 
it and notify the sender immediately.Before opening any email and/or 
attachments, please check them for viruses and other defects.*


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

Reply via email to