[jira] [Assigned] (DRILL-5550) SELECT non-existent column produces empty required VARCHAR

2019-09-11 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5550:
---

Assignee: (was: Prasad Nagaraj Subramanya)

> SELECT non-existent column produces empty required VARCHAR
> --
>
> Key: DRILL-5550
> URL: https://issues.apache.org/jira/browse/DRILL-5550
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: Future
>
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via 
> that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
> TextFormatConfig csvFormat = new TextFormatConfig();
> csvFormat.fieldDelimiter = ',';
> csvFormat.skipFirstLine = false;
> csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. 
> Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV 
> columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather 
> than nullable and NULL. In SQL, NULL means that the data is unknown, which is 
> the case here.
> In the future, we may want to use some other indication for a missing column. 
> Until then, the requested change is to make the type of a missing CSV column 
> a nullable VARCHAR set to value NULL.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (DRILL-5550) SELECT non-existent column produces empty required VARCHAR

2018-10-30 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-5550:
--

Assignee: Prasad Nagaraj Subramanya

> SELECT non-existent column produces empty required VARCHAR
> --
>
> Key: DRILL-5550
> URL: https://issues.apache.org/jira/browse/DRILL-5550
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
> Fix For: Future
>
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via 
> that storage plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
> TextFormatConfig csvFormat = new TextFormatConfig();
> csvFormat.fieldDelimiter = ',';
> csvFormat.skipFirstLine = false;
> csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. 
> Most other parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV 
> columns are of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather 
> than nullable and NULL. In SQL, NULL means that the data is unknown, which is 
> the case here.
> In the future, we may want to use some other indication for a missing column. 
> Until then, the requested change is to make the type of a missing CSV column 
> a nullable VARCHAR set to value NULL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)