Looks like the formatting got messed up for the query output. Here it is
in simple text format:
// My original JSON data consisting of maps and arrays:
0: jdbc:drill:drillbit=10.10.101.41> select * from t2.json;
+-----+---------------+-----------------------------------+
| a1 | b1 | c1 |
+-----+---------------+-----------------------------------+
| 10 | {"id":[1,2]} | {"d1":5,"e1":{"f1":["CA","TX"]}} |
+-----+---------------+-----------------------------------+
0: jdbc:drill:drillbit=10.10.101.41> create table dfs.tmp.tt3 as select *
from t2.json;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 1 |
+-----------+----------------------------+
// Analyze using parquet-tools
[root@aman1 ~]# java -jar parquet-tools/target/parquet-tools-1.9.0.jar
schema tt3/0_0_0.parquet
message root {
optional int64 a1;
optional group b1 {
repeated int64 id;
}
optional group c1 {
optional int64 d1;
optional group e1 {
repeated binary f1 (UTF8);
}
}
}
On Fri, Mar 2, 2018 at 11:48 AM, Aman Sinha <[email protected]> wrote:
> Erol,
> yes indeed Drill is internally creating the schema for Json data. The top
> level field's data type can be found using the TYPEOF(column) function that
> Gautam mentioned earlier.
> However, I understand you are looking for the nested schema as well, so I
> would recommend the following approach:
>
> - Run a CTAS and write a small sample of the table out in Parquet format.
> - Run parquet-tools [1] with the 'schema' option to explore the schema.
> - NOTE: Although this will show the Parquet schema, it at least will give
> an idea of the original JSON schema.
>
> Here's an example:
>
> *// My original JSON data consisting of maps and arrays: *
>
> 0: jdbc:drill:drillbit=10.10.101.41> select * from t2.json;
>
> *+-----+---------------+-----------------------------------+*
>
> *| **a1 ** | ** b1 ** | ** c1 ** |*
>
> *+-----+---------------+-----------------------------------+*
>
> *| *10 * | *{"id":[1,2]} * | *{"d1":5,"e1":{"f1":["CA","TX"]}} * |*
>
> *+-----+---------------+-----------------------------------+*
>
>
>
> 0: jdbc:drill:drillbit=10.10.101.41> create table dfs.tmp.tt3 as select *
> from t2.json;
>
> *+-----------+----------------------------+*
>
> *| **Fragment ** | **Number of records written ** |*
>
> *+-----------+----------------------------+*
>
> *| *0_0 * | *1 * |*
>
> *+-----------+----------------------------+*
>
>
> *// Analyze using parquet-tools*
>
> [root@aman1 ~]# java -jar parquet-tools/target/parquet-tools-1.9.0.jar
> schema tt3/0_0_0.parquet
>
> message root {
>
> optional int64 a1;
>
> optional group b1 {
>
> repeated int64 id;
>
> }
>
> optional group c1 {
>
> optional int64 d1;
>
> optional group e1 {
>
> repeated binary f1 (UTF8);
>
> }
>
> }
>
> }
>
> The meaning of these types is defined here [2].
>
>
> [1] http://parquet.apache.org/downloads/
> [2] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
>
> On Thu, Mar 1, 2018 at 6:46 PM, Erol Akarsu <[email protected]> wrote:
>
>> Boaz,
>>
>> Thanks for full explanation.
>> I got impression that Apache Drill is internally creating json schema for
>> input type. I thought we would be able to look deep into that schema,
>> Maybe it is not in "describe" level but it is API level we can achieve it.
>> That do you think?
>>
>> Erol Akarsu
>>
>> On Thu, Mar 1, 2018 at 7:04 PM, Boaz Ben-Zvi <[email protected]> wrote:
>>
>> > From the docs (https://drill.apache.org/docs/describe/): “Currently,
>> > DESCRIBE does not support tables created in a file system.”
>> > Seems that it only works well for Hive and HBase tables.
>> >
>> > The create view statement does not explore the actual schema of the
>> > query’s table(s); it only parses and keeps the query.
>> > Hence it can not tell what ‘*’ (or any list of column names) means:
>> >
>> > 0: jdbc:drill:zk=local> create view pview as select * from
>> > dfs.`/data/PARQUET-1M.parquet`;
>> > +-------+-------------------------------------------------------+
>> > | ok | summary |
>> > +-------+-------------------------------------------------------+
>> > | true | View 'pview' created successfully in 'dfs.tmp' schema |
>> > +-------+-------------------------------------------------------+
>> > 1 row selected (0.274 seconds)
>> > 0: jdbc:drill:zk=local> describe pview;
>> > +--------------+---------------+--------------+
>> > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
>> > +--------------+---------------+--------------+
>> > | ** | DYNAMIC_STAR | YES |
>> > +--------------+---------------+--------------+
>> > 1 row selected (0.419 seconds)
>> > 0: jdbc:drill:zk=local> create view pview2 as select
>> > row_count,just_garbage from dfs.`/data/PARQUET-1M.parquet`;
>> > +-------+---------------------------------------------------------+
>> > | ok | summary |
>> > +-------+---------------------------------------------------------+
>> > | true | View 'pview2' created successfully in 'dfs.tmp' schema |
>> > +-------+---------------------------------------------------------+
>> > 1 row selected (0.3 seconds)
>> > 0: jdbc:drill:zk=local> describe pview2;
>> > +---------------+------------+--------------+
>> > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
>> > +---------------+------------+--------------+
>> > | row_count | ANY | YES |
>> > | just_garbage | ANY | YES |
>> > +---------------+------------+--------------+
>> > 2 rows selected (0.533 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> > Thanks,
>> >
>> > Boaz
>> >
>> > On 3/1/18, 3:05 PM, "Padma Penumarthy" <[email protected]> wrote:
>> >
>> > Not sure why it is not showing the fields. It does not work for me
>> > either.
>> > Does anyone know more ? Is this broken ?
>> >
>> > Thanks
>> > Padma
>> >
>> > > On Mar 1, 2018, at 2:54 PM, Erol Akarsu <[email protected]>
>> wrote:
>> > >
>> > > Somehow, after "user dfs.tmp", I was able to create view. But
>> > "describe"
>> > > for view does not give much information. I was expecting
>> "describe"
>> > command
>> > > would give type definitions of fields " employee_id |
>> > full_name |
>> > > first_name | last_name | position_id | position_title |
>> > store_id |
>> > > depart ". But it does give a very generic field type.
>> > >
>> > >
>> > > 0: jdbc:drill:zk=local> create view mydonuts2 as SELECT * FROM
>> > > cp.`employee.json` LIMIT 3;
>> > > +-------+---------------------------------------------------
>> > ---------+
>> > > | ok | summary
>> > |
>> > > +-------+---------------------------------------------------
>> > ---------+
>> > > | true | View 'mydonuts2' created successfully in 'dfs.tmp'
>> schema
>> > |
>> > > +-------+---------------------------------------------------
>> > ---------+
>> > > 1 row selected (0.283 seconds)
>> > > 0: jdbc:drill:zk=local> describe mydonuts2;
>> > > +--------------+------------+--------------+
>> > > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
>> > > +--------------+------------+--------------+
>> > > | * | ANY | YES |
>> > > +--------------+------------+--------------+
>> > > 1 row selected (0.388 seconds)
>> > > 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 3;
>> > > +--------------+------------------+-------------+-----------
>> > -+--------------+---------------------+-----------+--------+
>> > > | employee_id | full_name | first_name | last_name |
>> > position_id
>> > > | position_title | store_id | depart |
>> > > +--------------+------------------+-------------+-----------
>> > -+--------------+---------------------+-----------+--------+
>> > > | 1 | Sheri Nowmer | Sheri | Nowmer | 1
>> > > | President | 0 | 1 |
>> > > | 2 | Derrick Whelply | Derrick | Whelply | 2
>> > > | VP Country Manager | 0 | 1 |
>> > > | 4 | Michael Spence | Michael | Spence | 2
>> > > | VP Country Manager | 0 | 1 |
>> > > +--------------+------------------+-------------+-----------
>> > -+--------------+---------------------+-----------+--------+
>> > > 3 rows selected (0.579 seconds)
>> > >
>> > > On Thu, Mar 1, 2018 at 3:18 PM, Erol Akarsu <[email protected]>
>> > wrote:
>> > >
>> > >> Padma,
>> > >>
>> > >> I have not created any user. I just installed the system and run
>> > drill
>> > >> with "sqlline.bat -u "jdbc:drill:zk=local"
>> > >> Therefore, what is shortest procedure to achieve what you have
>> > described
>> > >> in previous email?
>> > >>
>> > >> Thanks
>> > >>
>> > >> Erol Akarsu
>> > >>
>> > >> On Thu, Mar 1, 2018 at 3:00 PM, Padma Penumarthy <
>> > [email protected]>
>> > >> wrote:
>> > >>
>> > >>> Check if you have permissions to root directory or not.
>> > >>> You may have to specify the complete directory path (for which
>> you
>> > have
>> > >>> permissions for) in the create view command.
>> > >>>
>> > >>> For example:
>> > >>>
>> > >>> 0: jdbc:drill:zk=local> create view
>> dfs.root.`/Users/ppenumarthy/
>> > parquet/test-view`
>> > >>> as select * from dfs.root.`/Users/ppenumarthy/
>> > parquet/0_0_0.parquet`;
>> > >>> +-------+---------------------------------------------------
>> > >>> -------------------------------------+
>> > >>> | ok | summary
>> > >>> |
>> > >>> +-------+---------------------------------------------------
>> > >>> -------------------------------------+
>> > >>> | true | View '/Users/ppenumarthy/parquet/test-view' created
>> > >>> successfully in 'dfs.root' schema |
>> > >>> +-------+---------------------------------------------------
>> > >>> -------------------------------------+
>> > >>> 1 row selected (0.148 seconds)
>> > >>> 0: jdbc:drill:zk=local>
>> > >>>
>> > >>>
>> > >>> Thanks
>> > >>> Padma
>> > >>>
>> > >>> On Mar 1, 2018, at 11:37 AM, Erol Akarsu <[email protected]
>> > <mailto:eaka
>> > >>> [email protected]>> wrote:
>> > >>>
>> > >>> Padma,
>> > >>>
>> > >>> I have changed dfs storage plugin through web interface as
>> below.
>> > But I am
>> > >>> getting same error response.
>> > >>>
>> > >>> {
>> > >>> "type": "file",
>> > >>> "enabled": true,
>> > >>> "connection": "file:///",
>> > >>> "config": null,
>> > >>> "workspaces": {
>> > >>> "root": {
>> > >>> "location": "/",
>> > >>> "writable": true,
>> > >>> "defaultInputFormat": null,
>> > >>> "allowAccessOutsideWorkspace": true
>> > >>> },
>> > >>> "tmp": {
>> > >>> "location": "/tmp",
>> > >>> "writable": true,
>> > >>> "defaultInputFormat": null,
>> > >>> "allowAccessOutsideWorkspace": true
>> > >>> }
>> > >>> },
>> > >>>
>> > >>> On Thu, Mar 1, 2018 at 1:15 PM, Padma Penumarthy <
>> > [email protected]
>> > >>> <mailto:[email protected]>>
>> > >>> wrote:
>> > >>>
>> > >>> Make "writable": true for the workspace (dfs.root) in the
>> storage
>> > plugin
>> > >>> configuration.
>> > >>>
>> > >>> Thanks
>> > >>> Padma
>> > >>>
>> > >>>
>> > >>> On Mar 1, 2018, at 10:10 AM, Erol Akarsu <[email protected]
>> > <mailto:eaka
>> > >>> [email protected]><mailto:eaka
>> > >>> [email protected]<mailto:[email protected]>>> wrote:
>> > >>>
>> > >>> Thanks Padma.
>> > >>>
>> > >>> I am getting problem while creating view
>> > >>>
>> > >>> 0: jdbc:drill:zk=local> create view mydonuts as SELECT * FROM
>> > >>> cp.`employee.json` LIMIT 3;
>> > >>> Error: VALIDATION ERROR: Root schema is immutable. Creating or
>> > dropping
>> > >>> tables/views is not allowed in root schema.Select a schema using
>> > 'USE
>> > >>> schema' command.
>> > >>>
>> > >>>
>> > >>> [Error Id: 68a31047-5a4e-4768-8722-55648d9a80f6 on
>> > DESKTOP-8OANV3A:31010]
>> > >>> (state=,code=0)
>> > >>> 0: jdbc:drill:zk=local>
>> > >>>
>> > >>> On Thu, Mar 1, 2018 at 12:49 PM, Padma Penumarthy <
>> > [email protected]
>> > >>> <mailto:[email protected]>
>> > >>> <mailto:[email protected]>>
>> > >>> wrote:
>> > >>>
>> > >>> Try creating a view and use describe.
>> > >>>
>> > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
>> > >>> apache.org_docs_describe_&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=
>> > >>> XVqW14B9eGK9QR_fKKCb5H5LxKnqNMmq1U7RdNlhq1c&
>> > m=blTmu-WQJa5RUrxqG46o20B-a-
>> > >>> UP0yCHKGqcBo1ETlI&s=SxKqC4w_I5bbnNAtIvzO_9qptLP-KsHi0iIbjtzy
>> rc8&e=
>> > >>>
>> > >>> Thanks
>> > >>> Padma
>> > >>>
>> > >>>
>> > >>> On Mar 1, 2018, at 9:22 AM, Erol Akarsu <[email protected]
>> <mailto:
>> > eaka
>> > >>> [email protected]><mailto:eaka
>> > >>> [email protected]<mailto:[email protected]>><mailto:eaka
>> > >>> [email protected]<mailto:[email protected]><mailto:[email protected]>>>
>> wrote:
>> > >>>
>> > >>> When Use limit 0 query, I am getting only field names. I am
>> > looking for
>> > >>> json schema for input that will describe input type
>> > >>>
>> > >>> 0: jdbc:drill:> select * from `clicks/clicks.json` limit 0;
>> > >>>
>> > >>> +-----------+-------+-------+------------+-------------+
>> > >>> | trans_id | date | time | user_info | trans_info |
>> > >>> +-----------+-------+-------+------------+-------------+
>> > >>> +-----------+-------+-------+------------+-------------+
>> > >>> No
>> > >>>
>> > >>> On Thu, Mar 1, 2018 at 11:24 AM, Erol Akarsu <[email protected]
>> > <mailto:
>> > >>> [email protected]><mailto:
>> > >>> [email protected]<mailto:[email protected]>><mailto:
>> > >>> [email protected]<mailto:[email protected]><mailto:eakarsu@
>> > gmail.com>>>
>> > >>> wrote:
>> > >>>
>> > >>> I am sorry Sorabh
>> > >>> Can you give an example? I am still learning Drill
>> > >>> Thanks
>> > >>>
>> > >>> On Thu, Mar 1, 2018 at 11:11 AM Sorabh Hamirwasia <
>> > [email protected]
>> > >>> <mailto:[email protected]>
>> > >>> <mailto:[email protected]>
>> > >>> <mailto:[email protected]>>
>> > >>> wrote:
>> > >>>
>> > >>> Hi Erol,
>> > >>>
>> > >>> You can run limit 0 query from client to retrieve just the
>> schema
>> > for
>> > >>> your input.
>> > >>>
>> > >>>
>> > >>> Thanks,
>> > >>> Sorabh
>> > >>>
>> > >>> ________________________________
>> > >>> From: Erol Akarsu <[email protected]<mailto:[email protected]
>> > ><mailto:
>> > >>> [email protected]><mailto:
>> > >>> [email protected]<mailto:[email protected]>>>
>> > >>> Sent: Thursday, March 1, 2018 5:28:52 AM
>> > >>> To: [email protected]<mailto:[email protected]><mailto:
>> u
>> > >>> [email protected]><mailto:u
>> > >>> [email protected]<mailto:[email protected]>>
>> > >>> Subject: Accessing underlying scheme of input
>> > >>>
>> > >>> I know Apache drill is creating a json schema for input data
>> file
>> > or hdfs
>> > >>> input before user query on it.
>> > >>> I like to know whether or not Apache drill has API that will
>> help
>> > user to
>> > >>> obtain that derived schema for say an json file or excel file
>> or
>> > hive
>> > >>> input.
>> > >>> I appreciate your help
>> > >>>
>> > >>> Erol Akarsu
>> > >>>
>> > >>> Sent from Mail for Windows 10
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Erol Akarsu
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Erol Akarsu
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Erol Akarsu
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Erol Akarsu
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >>
>> > >> Erol Akarsu
>> > >>
>> > >>
>> > >
>> > >
>> > > --
>> > >
>> > > Erol Akarsu
>> >
>> >
>> >
>> >
>>
>>
>> --
>>
>> Erol Akarsu
>>
>
>