[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories
[ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-4614: --- Attachment: data.json > Drill must appoint one data type per one column for self-describing data > while querying directories > > > Key: DRILL-4614 > URL: https://issues.apache.org/jira/browse/DRILL-4614 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > Attachments: data.json > > > While drill selects data from the directory and detects data types on-the-fly > it is possible that one field will be of several data types . > For example: > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > In this case will be created parquet table as the folder with two files. > 3. Select the data > {code} > select t.others.additional from dfs.`tmp`.`tp` t > {code} > *The result of selecting will be mix of EXPR$0and > EXPR$0 .* > It happens because Drill defines column data type per file. > The same result with json files. > Since streaming aggregate does not support schema changes this issue makes > impossible of using aggregate functions with query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories
[ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-4614: --- Attachment: DRILL-3551.json > Drill must appoint one data type per one column for self-describing data > while querying directories > > > Key: DRILL-4614 > URL: https://issues.apache.org/jira/browse/DRILL-4614 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > > While drill selects data from the directory and detects data types on-the-fly > it is possible that one field will be of several data types . > For example: > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > In this case will be created parquet table as the folder with two files. > 3. Select the data > {code} > select t.others.additional from dfs.`tmp`.`tp` t > {code} > *The result of selecting will be mix of EXPR$0and > EXPR$0 .* > It happens because Drill defines column data type per file. > The same result with json files. > Since streaming aggregate does not support schema changes this issue makes > impossible of using aggregate functions with query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories
[ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-4614: --- Attachment: (was: DRILL-3551.json) > Drill must appoint one data type per one column for self-describing data > while querying directories > > > Key: DRILL-4614 > URL: https://issues.apache.org/jira/browse/DRILL-4614 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > > While drill selects data from the directory and detects data types on-the-fly > it is possible that one field will be of several data types . > For example: > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > In this case will be created parquet table as the folder with two files. > 3. Select the data > {code} > select t.others.additional from dfs.`tmp`.`tp` t > {code} > *The result of selecting will be mix of EXPR$0and > EXPR$0 .* > It happens because Drill defines column data type per file. > The same result with json files. > Since streaming aggregate does not support schema changes this issue makes > impossible of using aggregate functions with query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories
[ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-4614: --- Description: While drill selects data from the directory and detects data types on-the-fly it is possible that one field will be of several data types . For example: 1. Create an input file as follows 20K rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} 200 rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last entries only"}} 2. CTAS as follows {code:sql} CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t {code} In this case will be created parquet table as the folder with two files. 3. Select the data {code} select t.others.additional from dfs.`tmp`.`tp` t {code} *The result of selecting will be mix of EXPR$0and EXPR$0 .* It happens because Drill defines column data type per file. The same result with json files. Since streaming aggregate does not support schema changes this issue makes impossible of using aggregate functions with query results. was: While drill selects data from the directory and detects data types on-the-fly it is possible that one field will be of several data types . For example: 1. Create an input file as follows 20K rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} 200 rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last entries only"}} 2. CTAS as follows {code:sql} CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t {code} In this case will be created parquet table as the folder with two files. 3. Select the data {code} select t.others.additional from dfs.`tmp`.`tp` t {code} The result of selecting will be mix of EXPR$0 and EXPR$0 . It happens because Drill defines column data type per file. The same result with json files. Since streaming aggregate does not support schema changes this issue makes impossible of using aggregate functions with query results. > Drill must appoint one data type per one column for self-describing data > while querying directories > > > Key: DRILL-4614 > URL: https://issues.apache.org/jira/browse/DRILL-4614 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > > While drill selects data from the directory and detects data types on-the-fly > it is possible that one field will be of several data types . > For example: > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > In this case will be created parquet table as the folder with two files. > 3. Select the data > {code} > select t.others.additional from dfs.`tmp`.`tp` t > {code} > *The result of selecting will be mix of EXPR$0 and > EXPR$0 .* > It happens because Drill defines column data type per file. > The same result with json files. > Since streaming aggregate does not support schema changes this issue makes > impossible of using aggregate functions with query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)