Hi Vimal,

You've stumbled across one of the more frustrating bits of Drill. Drill is
"schema-free", meaning that the only information which Drill has to read
your data is the data itself. In your case, the JSON reader can infer that
"abc" is a MAP (Drill's term, Hive would call it a STRUCT.) Each file is
read in a different "fragment". One fragment says that "abc" is an empty
MAP, another says that it has some schema. These are merged sometime later
in the query.

If you had had a null value instead, Drill won't know that "abc" is a map
and would have guessed INT as the type. So, good that you have an empty
object, it avoids ambiguity.

Sounds like the issue is in the Parquet writer: that it has some limitation
on an empty group. Why is the group empty? Because, when writing the first
file with the empty group, the Parquet writer has no way to predict that
your "abc" field will eventually include a non-empty group. In fact, when
the non-empty group does appear, the Parquet schema must change. Not sure
what Parquet will do in that case: you may end up with some files with one
schema, other files with another schema.

What you want, of course, is for Drill to combine your files to create a
single schema for Parquet, setting fields to null when they are missing.
Drill can't currently do that effectively because it involves predicting
the future, which Drill cannot do.

Does anyone have more direct knowledge of how Parquet handles this case?

Thanks,

- Paul

On Fri, Sep 18, 2020 at 4:10 AM Vimal Jain <[email protected]> wrote:

> Hi,
> I am trying to convert my JSON data into Parquet format using CTAS query
> like below :-
>
> *create table ds2.root.`parquetOutput` as select * from
> TABLE(ds1.root.`jsonInput/` (type =>'json'));*
>
> But it fails with error :-
>
>
>
>
>
>
>
>
> *Error: SYSTEM ERROR: InvalidSchemaException: Cannot write a schema with an
> empty group: optional group abc {}Fragment 0:0Please, refer to logs for
> more information.[Error Id: fa3c0390-0093-4c4a-9b32-098d5cc68c7e on
> ip-172-30-3-153.ec2.internal:31010] (state=,code=0)*
>
> So can someone explain what is the issue here, can't my jsons have a key
> "abc" with value as empty object "{}" ?
> It's empty in some json files in ds1 but in some there is a value.
> Any help to resolve this would be appreciated.
>
> *Thanks and Regards,*
> *Vimal Jain*
>

Reply via email to