Hi Vimal, You've stumbled across one of the more frustrating bits of Drill. Drill is "schema-free", meaning that the only information which Drill has to read your data is the data itself. In your case, the JSON reader can infer that "abc" is a MAP (Drill's term, Hive would call it a STRUCT.) Each file is read in a different "fragment". One fragment says that "abc" is an empty MAP, another says that it has some schema. These are merged sometime later in the query.
If you had had a null value instead, Drill won't know that "abc" is a map and would have guessed INT as the type. So, good that you have an empty object, it avoids ambiguity. Sounds like the issue is in the Parquet writer: that it has some limitation on an empty group. Why is the group empty? Because, when writing the first file with the empty group, the Parquet writer has no way to predict that your "abc" field will eventually include a non-empty group. In fact, when the non-empty group does appear, the Parquet schema must change. Not sure what Parquet will do in that case: you may end up with some files with one schema, other files with another schema. What you want, of course, is for Drill to combine your files to create a single schema for Parquet, setting fields to null when they are missing. Drill can't currently do that effectively because it involves predicting the future, which Drill cannot do. Does anyone have more direct knowledge of how Parquet handles this case? Thanks, - Paul On Fri, Sep 18, 2020 at 4:10 AM Vimal Jain <[email protected]> wrote: > Hi, > I am trying to convert my JSON data into Parquet format using CTAS query > like below :- > > *create table ds2.root.`parquetOutput` as select * from > TABLE(ds1.root.`jsonInput/` (type =>'json'));* > > But it fails with error :- > > > > > > > > > *Error: SYSTEM ERROR: InvalidSchemaException: Cannot write a schema with an > empty group: optional group abc {}Fragment 0:0Please, refer to logs for > more information.[Error Id: fa3c0390-0093-4c4a-9b32-098d5cc68c7e on > ip-172-30-3-153.ec2.internal:31010] (state=,code=0)* > > So can someone explain what is the issue here, can't my jsons have a key > "abc" with value as empty object "{}" ? > It's empty in some json files in ds1 but in some there is a value. > Any help to resolve this would be appreciated. > > *Thanks and Regards,* > *Vimal Jain* >
