Erol,
yes indeed Drill is internally creating the schema for Json data.  The top
level field's data type can be found using the TYPEOF(column) function that
Gautam mentioned earlier.
However, I understand you are looking for the nested schema as well, so I
would recommend the following approach:

- Run a CTAS and write a small sample of the table out in Parquet format.
- Run parquet-tools [1] with the 'schema' option to explore the schema.
- NOTE: Although this will show the Parquet schema, it at least will give
an idea of the original JSON schema.

Here's an example:

*// My original JSON data consisting of maps and arrays: *

0: jdbc:drill:drillbit=10.10.101.41> select * from t2.json;

*+-----+---------------+-----------------------------------+*

*| **a1 ** | **     b1      ** | **               c1                ** |*

*+-----+---------------+-----------------------------------+*

*| *10 * | *{"id":[1,2]} * | *{"d1":5,"e1":{"f1":["CA","TX"]}} * |*

*+-----+---------------+-----------------------------------+*



0: jdbc:drill:drillbit=10.10.101.41> create table dfs.tmp.tt3 as select *
from t2.json;

*+-----------+----------------------------+*

*| **Fragment ** | **Number of records written ** |*

*+-----------+----------------------------+*

*| *0_0      * | *1                         * |*

*+-----------+----------------------------+*


*// Analyze using parquet-tools*

[root@aman1 ~]# java -jar  parquet-tools/target/parquet-tools-1.9.0.jar
schema tt3/0_0_0.parquet

message root {

  optional int64 a1;

  optional group b1 {

    repeated int64 id;

  }

  optional group c1 {

    optional int64 d1;

    optional group e1 {

      repeated binary f1 (UTF8);

    }

  }

}

The meaning of these types is defined here [2].


[1] http://parquet.apache.org/downloads/
[2] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

On Thu, Mar 1, 2018 at 6:46 PM, Erol Akarsu <eaka...@gmail.com> wrote:

> Boaz,
>
> Thanks for full explanation.
> I got impression that Apache Drill is internally creating json schema for
> input type. I thought we would be able to look deep into that schema,
> Maybe it is not in "describe" level but it is API level we can achieve it.
> That do you think?
>
> Erol Akarsu
>
> On Thu, Mar 1, 2018 at 7:04 PM, Boaz Ben-Zvi <bben-...@mapr.com> wrote:
>
> > From the docs (https://drill.apache.org/docs/describe/): “Currently,
> > DESCRIBE does not support tables created in a file system.”
> > Seems that it only works well for Hive and HBase tables.
> >
> > The create view statement does not explore the actual schema of the
> > query’s table(s); it only parses and keeps the query.
> > Hence it can not tell what ‘*’ (or any list of column names) means:
> >
> >  0: jdbc:drill:zk=local> create view pview as select * from
> > dfs.`/data/PARQUET-1M.parquet`;
> > +-------+-------------------------------------------------------+
> > |  ok   |                        summary                        |
> > +-------+-------------------------------------------------------+
> > | true  | View 'pview' created successfully in 'dfs.tmp' schema  |
> > +-------+-------------------------------------------------------+
> > 1 row selected (0.274 seconds)
> > 0: jdbc:drill:zk=local> describe pview;
> > +--------------+---------------+--------------+
> > | COLUMN_NAME  |   DATA_TYPE   | IS_NULLABLE  |
> > +--------------+---------------+--------------+
> > | **           | DYNAMIC_STAR  | YES          |
> > +--------------+---------------+--------------+
> > 1 row selected (0.419 seconds)
> > 0: jdbc:drill:zk=local> create view pview2 as select
> > row_count,just_garbage from  dfs.`/data/PARQUET-1M.parquet`;
> > +-------+---------------------------------------------------------+
> > |  ok   |                         summary                         |
> > +-------+---------------------------------------------------------+
> > | true  | View 'pview2' created successfully in 'dfs.tmp' schema  |
> > +-------+---------------------------------------------------------+
> > 1 row selected (0.3 seconds)
> > 0: jdbc:drill:zk=local> describe pview2;
> > +---------------+------------+--------------+
> > |  COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> > +---------------+------------+--------------+
> > | row_count     | ANY        | YES          |
> > | just_garbage  | ANY        | YES          |
> > +---------------+------------+--------------+
> > 2 rows selected (0.533 seconds)
> > 0: jdbc:drill:zk=local>
> >
> > Thanks,
> >
> >       Boaz
> >
> > On 3/1/18, 3:05 PM, "Padma Penumarthy" <ppenumar...@mapr.com> wrote:
> >
> >     Not sure why it is not showing the fields. It does not work for me
> > either.
> >     Does anyone know more ? Is this broken ?
> >
> >     Thanks
> >     Padma
> >
> >     > On Mar 1, 2018, at 2:54 PM, Erol Akarsu <eaka...@gmail.com> wrote:
> >     >
> >     > Somehow, after "user dfs.tmp", I was able to create view. But
> > "describe"
> >     > for view does not give much information. I was expecting "describe"
> > command
> >     > would give type  definitions of fields " employee_id  |
> > full_name     |
> >     > first_name  | last_name  | position_id  |   position_title    |
> > store_id  |
> >     > depart ". But it does give a very generic field type.
> >     >
> >     >
> >     > 0: jdbc:drill:zk=local> create view mydonuts2 as SELECT * FROM
> >     > cp.`employee.json` LIMIT 3;
> >     > +-------+---------------------------------------------------
> > ---------+
> >     > |  ok   |                          summary
> >  |
> >     > +-------+---------------------------------------------------
> > ---------+
> >     > | true  | View 'mydonuts2' created successfully in 'dfs.tmp' schema
> > |
> >     > +-------+---------------------------------------------------
> > ---------+
> >     > 1 row selected (0.283 seconds)
> >     > 0: jdbc:drill:zk=local> describe mydonuts2;
> >     > +--------------+------------+--------------+
> >     > | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >     > +--------------+------------+--------------+
> >     > | *            | ANY        | YES          |
> >     > +--------------+------------+--------------+
> >     > 1 row selected (0.388 seconds)
> >     > 0: jdbc:drill:zk=local>  SELECT * FROM cp.`employee.json` LIMIT 3;
> >     > +--------------+------------------+-------------+-----------
> > -+--------------+---------------------+-----------+--------+
> >     > | employee_id  |    full_name     | first_name  | last_name  |
> > position_id
> >     > |   position_title    | store_id  | depart |
> >     > +--------------+------------------+-------------+-----------
> > -+--------------+---------------------+-----------+--------+
> >     > | 1            | Sheri Nowmer     | Sheri       | Nowmer     | 1
> >     > | President           | 0         | 1      |
> >     > | 2            | Derrick Whelply  | Derrick     | Whelply    | 2
> >     > | VP Country Manager  | 0         | 1      |
> >     > | 4            | Michael Spence   | Michael     | Spence     | 2
> >     > | VP Country Manager  | 0         | 1      |
> >     > +--------------+------------------+-------------+-----------
> > -+--------------+---------------------+-----------+--------+
> >     > 3 rows selected (0.579 seconds)
> >     >
> >     > On Thu, Mar 1, 2018 at 3:18 PM, Erol Akarsu <eaka...@gmail.com>
> > wrote:
> >     >
> >     >> Padma,
> >     >>
> >     >> I have not created any user. I just installed the system and run
> > drill
> >     >> with  "sqlline.bat -u "jdbc:drill:zk=local"
> >     >> Therefore, what is shortest procedure to achieve what you have
> > described
> >     >> in previous email?
> >     >>
> >     >> Thanks
> >     >>
> >     >> Erol Akarsu
> >     >>
> >     >> On Thu, Mar 1, 2018 at 3:00 PM, Padma Penumarthy <
> > ppenumar...@mapr.com>
> >     >> wrote:
> >     >>
> >     >>> Check if you have permissions to root directory or not.
> >     >>> You may have to specify the complete directory path (for which
> you
> > have
> >     >>> permissions for) in the create view command.
> >     >>>
> >     >>> For example:
> >     >>>
> >     >>> 0: jdbc:drill:zk=local> create view dfs.root.`/Users/ppenumarthy/
> > parquet/test-view`
> >     >>> as select * from dfs.root.`/Users/ppenumarthy/
> > parquet/0_0_0.parquet`;
> >     >>> +-------+---------------------------------------------------
> >     >>> -------------------------------------+
> >     >>> |  ok   |                                        summary
> >     >>>                       |
> >     >>> +-------+---------------------------------------------------
> >     >>> -------------------------------------+
> >     >>> | true  | View '/Users/ppenumarthy/parquet/test-view' created
> >     >>> successfully in 'dfs.root' schema  |
> >     >>> +-------+---------------------------------------------------
> >     >>> -------------------------------------+
> >     >>> 1 row selected (0.148 seconds)
> >     >>> 0: jdbc:drill:zk=local>
> >     >>>
> >     >>>
> >     >>> Thanks
> >     >>> Padma
> >     >>>
> >     >>> On Mar 1, 2018, at 11:37 AM, Erol Akarsu <eaka...@gmail.com
> > <mailto:eaka
> >     >>> r...@gmail.com>> wrote:
> >     >>>
> >     >>> Padma,
> >     >>>
> >     >>> I have changed dfs storage plugin through web interface as below.
> > But I am
> >     >>> getting same error response.
> >     >>>
> >     >>> {
> >     >>> "type": "file",
> >     >>> "enabled": true,
> >     >>> "connection": "file:///",
> >     >>> "config": null,
> >     >>> "workspaces": {
> >     >>>   "root": {
> >     >>>     "location": "/",
> >     >>>     "writable": true,
> >     >>>     "defaultInputFormat": null,
> >     >>>     "allowAccessOutsideWorkspace": true
> >     >>>   },
> >     >>>   "tmp": {
> >     >>>     "location": "/tmp",
> >     >>>     "writable": true,
> >     >>>     "defaultInputFormat": null,
> >     >>>     "allowAccessOutsideWorkspace": true
> >     >>>   }
> >     >>> },
> >     >>>
> >     >>> On Thu, Mar 1, 2018 at 1:15 PM, Padma Penumarthy <
> > ppenumar...@mapr.com
> >     >>> <mailto:ppenumar...@mapr.com>>
> >     >>> wrote:
> >     >>>
> >     >>> Make "writable": true for the workspace (dfs.root) in the storage
> > plugin
> >     >>> configuration.
> >     >>>
> >     >>> Thanks
> >     >>> Padma
> >     >>>
> >     >>>
> >     >>> On Mar 1, 2018, at 10:10 AM, Erol Akarsu <eaka...@gmail.com
> > <mailto:eaka
> >     >>> r...@gmail.com><mailto:eaka
> >     >>> r...@gmail.com<mailto:r...@gmail.com>>> wrote:
> >     >>>
> >     >>> Thanks Padma.
> >     >>>
> >     >>> I am getting problem while creating view
> >     >>>
> >     >>> 0: jdbc:drill:zk=local> create view mydonuts as SELECT * FROM
> >     >>> cp.`employee.json` LIMIT 3;
> >     >>> Error: VALIDATION ERROR: Root schema is immutable. Creating or
> > dropping
> >     >>> tables/views is not allowed in root schema.Select a schema using
> > 'USE
> >     >>> schema' command.
> >     >>>
> >     >>>
> >     >>> [Error Id: 68a31047-5a4e-4768-8722-55648d9a80f6 on
> > DESKTOP-8OANV3A:31010]
> >     >>> (state=,code=0)
> >     >>> 0: jdbc:drill:zk=local>
> >     >>>
> >     >>> On Thu, Mar 1, 2018 at 12:49 PM, Padma Penumarthy <
> > ppenumar...@mapr.com
> >     >>> <mailto:ppenumar...@mapr.com>
> >     >>> <mailto:ppenumar...@mapr.com>>
> >     >>> wrote:
> >     >>>
> >     >>> Try creating a view and use describe.
> >     >>>
> >     >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> >     >>> apache.org_docs_describe_&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=
> >     >>> XVqW14B9eGK9QR_fKKCb5H5LxKnqNMmq1U7RdNlhq1c&
> > m=blTmu-WQJa5RUrxqG46o20B-a-
> >     >>> UP0yCHKGqcBo1ETlI&s=SxKqC4w_I5bbnNAtIvzO_9qptLP-
> KsHi0iIbjtzyrc8&e=
> >     >>>
> >     >>> Thanks
> >     >>> Padma
> >     >>>
> >     >>>
> >     >>> On Mar 1, 2018, at 9:22 AM, Erol Akarsu <eaka...@gmail.com
> <mailto:
> > eaka
> >     >>> r...@gmail.com><mailto:eaka
> >     >>> r...@gmail.com<mailto:r...@gmail.com>><mailto:eaka
> >     >>> r...@gmail.com<mailto:r...@gmail.com><mailto:r...@gmail.com>>>
> wrote:
> >     >>>
> >     >>> When Use limit 0 query,  I am getting only field names. I am
> > looking for
> >     >>> json schema for input that will describe input type
> >     >>>
> >     >>> 0: jdbc:drill:> select * from `clicks/clicks.json` limit 0;
> >     >>>
> >     >>> +-----------+-------+-------+------------+-------------+
> >     >>> | trans_id  | date  | time  | user_info  | trans_info  |
> >     >>> +-----------+-------+-------+------------+-------------+
> >     >>> +-----------+-------+-------+------------+-------------+
> >     >>> No
> >     >>>
> >     >>> On Thu, Mar 1, 2018 at 11:24 AM, Erol Akarsu <eaka...@gmail.com
> > <mailto:
> >     >>> eaka...@gmail.com><mailto:
> >     >>> eaka...@gmail.com<mailto:eaka...@gmail.com>><mailto:
> >     >>> eaka...@gmail.com<mailto:eaka...@gmail.com><mailto:eakarsu@
> > gmail.com>>>
> >     >>> wrote:
> >     >>>
> >     >>> I am sorry Sorabh
> >     >>> Can you give an example? I am still learning Drill
> >     >>> Thanks
> >     >>>
> >     >>> On Thu, Mar 1, 2018 at 11:11 AM Sorabh Hamirwasia <
> > shamirwa...@mapr.com
> >     >>> <mailto:shamirwa...@mapr.com>
> >     >>> <mailto:shamirwa...@mapr.com>
> >     >>> <mailto:shamirwa...@mapr.com>>
> >     >>> wrote:
> >     >>>
> >     >>> Hi Erol,
> >     >>>
> >     >>> You can run limit 0 query from client to retrieve just the schema
> > for
> >     >>> your input.
> >     >>>
> >     >>>
> >     >>> Thanks,
> >     >>> Sorabh
> >     >>>
> >     >>> ________________________________
> >     >>> From: Erol Akarsu <eaka...@gmail.com<mailto:eaka...@gmail.com
> > ><mailto:
> >     >>> eaka...@gmail.com><mailto:
> >     >>> eaka...@gmail.com<mailto:eaka...@gmail.com>>>
> >     >>> Sent: Thursday, March 1, 2018 5:28:52 AM
> >     >>> To: user@drill.apache.org<mailto:user@drill.apache.org><mailto:u
> >     >>> s...@drill.apache.org><mailto:u
> >     >>> s...@drill.apache.org<mailto:s...@drill.apache.org>>
> >     >>> Subject: Accessing underlying scheme of input
> >     >>>
> >     >>> I know Apache drill is creating a json schema for input data file
> > or hdfs
> >     >>> input before user query on it.
> >     >>> I like to know whether or not Apache drill has API that will help
> > user to
> >     >>> obtain that  derived schema for say an json file or excel file or
> > hive
> >     >>> input.
> >     >>> I appreciate your help
> >     >>>
> >     >>> Erol Akarsu
> >     >>>
> >     >>> Sent from Mail for Windows 10
> >     >>>
> >     >>> --
> >     >>>
> >     >>> Erol Akarsu
> >     >>>
> >     >>>
> >     >>>
> >     >>>
> >     >>> --
> >     >>>
> >     >>> Erol Akarsu
> >     >>>
> >     >>>
> >     >>>
> >     >>>
> >     >>> --
> >     >>>
> >     >>> Erol Akarsu
> >     >>>
> >     >>>
> >     >>>
> >     >>>
> >     >>> --
> >     >>>
> >     >>> Erol Akarsu
> >     >>>
> >     >>>
> >     >>
> >     >>
> >     >> --
> >     >>
> >     >> Erol Akarsu
> >     >>
> >     >>
> >     >
> >     >
> >     > --
> >     >
> >     > Erol Akarsu
> >
> >
> >
> >
>
>
> --
>
> Erol Akarsu
>

Reply via email to