[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875631#comment-16875631
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/30/19 1:55 AM:
-

Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to 
do: it avoids setting the precision if the precision is zero. This allows the 
(wrong) code in the REST feature to work. Still, the incorrect code should 
change as explained above to avoid breaking the next time someone sets a 
precision of 0.

Also removed the empty schema batch so that simple queries return just one 
batch of data.

The result is that the broken code in the REST call should work for simple 
one-batch queries. Nothing I can do, however, will fix the fact that the schema 
will be repeated for every batch; fixing that will require changes to the REST 
code itself.


was (Author: paul.rogers):
Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to 
do: it avoids setting the precision if the precision is zero. This allows the 
(wrong) code in this feature to work. The incorrect code should change.

Also removed the empty schema batch so that simple queries return just one 
batch of data.

The result is that the broken code in the REST call should work for simple 
one-batch queries. Nothing I can do, however, will fix the fact that the schema 
will be repeated for every batch; fixing that will require changes to the REST 
code itself.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872033#comment-16872033
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/30/19 1:54 AM:
-

Recall that Drill can return not only multiple batches, but multiple "result 
sets": runs of batches with different schemas.


A more sophisticated REST solution would handle this case. I can't find any 
ProtoBuf field that says that the schema changed. Instead, we'd have to reuse 
code from elsewhere which compares the current schema to the previous one. 
Ideally, in that case, we'd create a new JSON element for the second schema. 
Something like:

{code:json}
{ resultSets: [
    { "rows": ...
  "schema": ...
    }, 
    { "rows": ...
  "schema": ...
    } ]
}
{code}

It is easy to create such a case. Simply create two CSV files, one with 2 
columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} 
query. You will get two data batches, each with a distinct schema.

The current implementation will give just the first schema and all rows, with 
varying schemas. (Actually, the current implementation will list the two 
columns, then the three columns, duplicating the first two, but we want to fix 
that...)

This is yet another reason to use a provisioned schema: with such a schema we 
can guarantee that the entire query will return a single, consistent schema 
regardless of the variation across files.

A quick & dirty solution is to clear and rebuild the schema objects on every 
batch. That way, the value sent to the user will reflect the last schema which, 
if you are lucky, will be valid for the initial batches as well as later 
batches.

It is a known open, unresolved issue that Drill does not attempt to merge 
schema changes, and that unmerged schema changes cannot be handled by ODBC or 
JDBC clients. We can assume, however, that the users of the REST API won't have 
messy data and won't run into this issue.


was (Author: paul.rogers):
Recall that Drill can return not only multiple batches, but multiple "result 
sets": runs of batches with different schemas.


A more sophisticated REST solution would handle this case. I can't find any 
ProtoBuf field that says that the schema changed. Instead, we'd have to reuse 
code from elsewhere which compares the current schema to the previous one. 
Ideally, in that case, we'd create a new JSON element for the second schema. 
Something like:

{code:json}
{ resultSets: [
    { "rows": ...
  "schema": ...
    }, 
    { "rows": ...
  "schema": ...
    } ]
}
{code}

It is easy to create such a case. Simply create two CSV files, one with 2 
columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} 
query. You will get two data batches, each with a distinct schema.

The current implementation will give just the first schema and all rows, with 
varying schemas. (Actually, the current implementation will list the two 
columns, then the three columns, duplicating the first two, but we want to fix 
that...)

This is yet another reason to use a provisioned schema: with such a schema we 
can guarantee that the entire query will return a single, consistent schema 
regardless of the variation across files.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872964#comment-16872964
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/29/19 6:13 PM:
-

[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10)}}. The scale field is never 
valid for {{VARCHAR}}.

The output of {{VARCHAR(0,0)}} is not a problem with the code that generated 
the schema. Instead, it is a problem with the way that the REST code attempts 
to generate a type name from the schema structures. To be more precise, the 
REST code incorrectly assumes that the {{isSet()}} methods are the correct way 
to check for a 0 value. This is an incorrect assumption.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

{code:java}
if (precision != 0) { schemaBuilder.setPrecision(precision); }
{code}

So, the code that uses the schema objects should do the following to determine 
if the value is other than the default: both ask if the value is set, and if 
so, ask if the value is non-zero. As it turns out, the unset value is 0, so 
there is actually no need to ask if the value is set in this case.

Taking a step back, the type formatting code should not even be in the REST 
API. The proper place for it is in {{Types}}. In fact, {{Types}} already has 
the desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

Note that {{getExtendedSqlTypeName()}} exposes the *SQL name* for types. The 
current REST implementation exposes the internal Drill name. That is, 
{{getExtendedSqlTypeName()}} will report, say, {{DOUBLE}} while the REST code 
will report {{Float8}}. This is probably a bug since the documentation explains 
the SQL types, not the internal types.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is. Or, if we want to display the 
width, add the required code to that function.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.


was (Author: paul.rogers):
[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10}}. The scale is never valid for 
{{VARCHAR}}, it is an artifact of the incorrect way the above code was written.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

{code:java}
if (precision != 0) { schemaBuilder.setPrecision(precision); }
{code}

So, we should ask if the precision is set and non-zero.

In fact, the type formatting code should not even be in the REST API. The 
proper place for it is in {{Types}}. In fact, that class already has the 
desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following 

[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872964#comment-16872964
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/26/19 6:11 AM:
-

[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10}}. The scale is never valid for 
{{VARCHAR}}, it is an artifact of the incorrect way the above code was written.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

{code:java}
if (precision != 0) { schemaBuilder.setPrecision(precision); }
{code}

So, we should ask if the precision is set and non-zero.

In fact, the type formatting code should not even be in the REST API. The 
proper place for it is in {{Types}}. In fact, that class already has the 
desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.


was (Author: paul.rogers):
[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10}}. The scale is never valid for 
{{VARCHAR}}, it is an artifact of the incorrect way the above code was written.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

```
if (precision != 0) { schemaBuilder.setPrecision(precision); }
```

So, we should ask if the precision is set and non-zero.

In fact, the type formatting code should not even be in the REST API. The 
proper place for it is in {{Types}}. In fact, that class already has the 
desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871990#comment-16871990
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 5:27 AM:
-

The width issue appears to have been introduced with this commit: "DRILL-6847: 
Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your 
PR...). In {{WebUserConnection}}:

{code:java}
  //For DECIMAL type
  if (col.getType().hasPrecision()) {
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
  dataType.append(", ");
  dataType.append(col.getType().getScale());
}

dataType.append(")");
  } else if (col.getType().hasWidth()) {
//Case for VARCHAR columns with specified width
dataType.append("(");
dataType.append(col.getType().getWidth());
dataType.append(")");
  }
{code}

I did not debug the code, but it appears that {{hasPrecision()}} and 
{{hasScale()}} simply report if the field is set; it does *not* tell us if the 
field is zero.

Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision 
field, so the supposed {{VARCHAR}} code block is a no-op.

The correct code would be something like:

{code:java}
  //For DECIMAL and VARCHAR types
  if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) 
{
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale() && col.getType().getScale() > 0) {
{code}


was (Author: paul.rogers):
The width issue appears to have been introduced with this commit: "DRILL-6847: 
Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your 
PR...):

{code:java}
  //For DECIMAL type
  if (col.getType().hasPrecision()) {
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
  dataType.append(", ");
  dataType.append(col.getType().getScale());
}

dataType.append(")");
  } else if (col.getType().hasWidth()) {
//Case for VARCHAR columns with specified width
dataType.append("(");
dataType.append(col.getType().getWidth());
dataType.append(")");
  }
{code}

I did not debug the code, but it appears that {{hasPrecision()}} and 
{{hasScale()}} simply report if the field is set; it does *not* tell us if the 
field is zero.

Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision 
field, so the supposed {{VARCHAR}} code block is a no-op.

The correct code would be something like:

{code:java}
  //For DECIMAL and VARCHAR types
  if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) 
{
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale() && col.getType().getScale() > 0) {
{code}

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871992#comment-16871992
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 4:41 AM:
-

This issue points out a unfortunate reality (IMHO): lack of unit tests for the 
REST API. We have nothing, other than vigilent users, to track down issues such 
as this one.

I believe that unit tests can be easily created: use a {{ClusterTest}} and set 
the config\(?) option to enable the web server. Use a web client of some sort 
to fire a request. Either compare the results against a golden file, or just 
test for the bits of interest (such as, for DRILL-6847, test against each type 
and mode, and with and without width/precision, and verify just that part of 
the result.

The unit test would also allow very easy debugging. It seems the best we can do 
at present is build all of Drill, start it, and connect a remote debugger. This 
is so cumbersome that folks will avoid stepping through code to see if it works.


was (Author: paul.rogers):
This issue points out a unfortunate reality (IMHO): lack of unit tests for the 
REST API. We have nothing, other than vigilent users, to track down issues such 
as this one.

I believe that unit tests can be easily created: use a {{ClusterTest}} and set 
the config(?) option to enable the web server. Use a web client of some sort to 
fire a request. Either compare the results against a golden file, or just test 
for the bits of interest (such as, for DRILL-6847, test against each type and 
mode, and with and without width/precision, and verify just that part of the 
result.

The unit test would also allow very easy debugging. It seems the best we can do 
at present is build all of Drill, start it, and connect a remote debugger. This 
is so cumbersome that folks will avoid stepping through code to see if it works.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871945#comment-16871945
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 3:08 AM:
-

According to the screen shot, this is the REST API, method POST with query as 
payload, URL is {{http::/query.json}}.


was (Author: paul.rogers):
I presume this is the REST API? Please specify the URL used to do the query.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)