Thanks Paul for quick response. So reading your response, looks like this has something to do would Parquet instead of Drill ? I would post this question in the Parquet community group as well to see if we can get an answer for this.
*Thanks and Regards,* *Vimal Jain* On Fri, Sep 18, 2020 at 10:45 PM Paul Rogers <[email protected]> wrote: > Hi Vimal, > > You've stumbled across one of the more frustrating bits of Drill. Drill is > "schema-free", meaning that the only information which Drill has to read > your data is the data itself. In your case, the JSON reader can infer that > "abc" is a MAP (Drill's term, Hive would call it a STRUCT.) Each file is > read in a different "fragment". One fragment says that "abc" is an empty > MAP, another says that it has some schema. These are merged sometime later > in the query. > > If you had had a null value instead, Drill won't know that "abc" is a map > and would have guessed INT as the type. So, good that you have an empty > object, it avoids ambiguity. > > Sounds like the issue is in the Parquet writer: that it has some limitation > on an empty group. Why is the group empty? Because, when writing the first > file with the empty group, the Parquet writer has no way to predict that > your "abc" field will eventually include a non-empty group. In fact, when > the non-empty group does appear, the Parquet schema must change. Not sure > what Parquet will do in that case: you may end up with some files with one > schema, other files with another schema. > > What you want, of course, is for Drill to combine your files to create a > single schema for Parquet, setting fields to null when they are missing. > Drill can't currently do that effectively because it involves predicting > the future, which Drill cannot do. > > Does anyone have more direct knowledge of how Parquet handles this case? > > Thanks, > > - Paul > > On Fri, Sep 18, 2020 at 4:10 AM Vimal Jain <[email protected]> wrote: > > > Hi, > > I am trying to convert my JSON data into Parquet format using CTAS query > > like below :- > > > > *create table ds2.root.`parquetOutput` as select * from > > TABLE(ds1.root.`jsonInput/` (type =>'json'));* > > > > But it fails with error :- > > > > > > > > > > > > > > > > > > *Error: SYSTEM ERROR: InvalidSchemaException: Cannot write a schema with > an > > empty group: optional group abc {}Fragment 0:0Please, refer to logs for > > more information.[Error Id: fa3c0390-0093-4c4a-9b32-098d5cc68c7e on > > ip-172-30-3-153.ec2.internal:31010] (state=,code=0)* > > > > So can someone explain what is the issue here, can't my jsons have a key > > "abc" with value as empty object "{}" ? > > It's empty in some json files in ds1 but in some there is a value. > > Any help to resolve this would be appreciated. > > > > *Thanks and Regards,* > > *Vimal Jain* > > >
