[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875692#comment-16875692
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568
 
 
   When running the full tests, the following failed in `java-exec`:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   When running this unit test in Eclipse, two tests failed: `testDropFunction` 
and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, 
`testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever.
   
   I believe these tests (and one other) seem to fail about 50% of the time on 
my builds. For example:
   
   ```
   [ERROR] Errors: 
   [ERROR]   TestPStoreProviders.verifyZkStore:67 » NoSuchElement
   ```
   
   The workaround seems to be to rebuild all of Drill. That is, the rough 
pattern seems to be that this test will run once after a clean build, but will 
fail if run a second time or after a code change. Not sure if this is the exact 
pattern; something like this happens.
   
   The result is that it is hard to tell if my code broke something or if the 
tests are just flaky. I wonder, is there something we can do to stabilize these 
tests? All other tests run fine if I rerun them a second time on the same build 
or after I make a small code change.
   
   Anyway, after doing a full rebuild and retest, this commit does pass all 
unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875631#comment-16875631
 ] 

Paul Rogers commented on DRILL-7308:


Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to 
do: it avoids setting the precision if the precision is zero. This allows the 
(wrong) code in this feature to work. The incorrect code should change.

Also removed the empty schema batch so that simple queries return just one 
batch of data.

The result is that the broken code in the REST call should work for simple 
one-batch queries. Nothing I can do, however, will fix the fact that the schema 
will be repeated for every batch; fixing that will require changes to the REST 
code itself.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872033#comment-16872033
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/30/19 1:54 AM:
-

Recall that Drill can return not only multiple batches, but multiple "result 
sets": runs of batches with different schemas.


A more sophisticated REST solution would handle this case. I can't find any 
ProtoBuf field that says that the schema changed. Instead, we'd have to reuse 
code from elsewhere which compares the current schema to the previous one. 
Ideally, in that case, we'd create a new JSON element for the second schema. 
Something like:

{code:json}
{ resultSets: [
    { "rows": ...
  "schema": ...
    }, 
    { "rows": ...
  "schema": ...
    } ]
}
{code}

It is easy to create such a case. Simply create two CSV files, one with 2 
columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} 
query. You will get two data batches, each with a distinct schema.

The current implementation will give just the first schema and all rows, with 
varying schemas. (Actually, the current implementation will list the two 
columns, then the three columns, duplicating the first two, but we want to fix 
that...)

This is yet another reason to use a provisioned schema: with such a schema we 
can guarantee that the entire query will return a single, consistent schema 
regardless of the variation across files.

A quick & dirty solution is to clear and rebuild the schema objects on every 
batch. That way, the value sent to the user will reflect the last schema which, 
if you are lucky, will be valid for the initial batches as well as later 
batches.

It is a known open, unresolved issue that Drill does not attempt to merge 
schema changes, and that unmerged schema changes cannot be handled by ODBC or 
JDBC clients. We can assume, however, that the users of the REST API won't have 
messy data and won't run into this issue.


was (Author: paul.rogers):
Recall that Drill can return not only multiple batches, but multiple "result 
sets": runs of batches with different schemas.


A more sophisticated REST solution would handle this case. I can't find any 
ProtoBuf field that says that the schema changed. Instead, we'd have to reuse 
code from elsewhere which compares the current schema to the previous one. 
Ideally, in that case, we'd create a new JSON element for the second schema. 
Something like:

{code:json}
{ resultSets: [
    { "rows": ...
  "schema": ...
    }, 
    { "rows": ...
  "schema": ...
    } ]
}
{code}

It is easy to create such a case. Simply create two CSV files, one with 2 
columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} 
query. You will get two data batches, each with a distinct schema.

The current implementation will give just the first schema and all rows, with 
varying schemas. (Actually, the current implementation will list the two 
columns, then the three columns, duplicating the first two, but we want to fix 
that...)

This is yet another reason to use a provisioned schema: with such a schema we 
can guarantee that the entire query will return a single, consistent schema 
regardless of the variation across files.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875631#comment-16875631
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/30/19 1:55 AM:
-

Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to 
do: it avoids setting the precision if the precision is zero. This allows the 
(wrong) code in the REST feature to work. Still, the incorrect code should 
change as explained above to avoid breaking the next time someone sets a 
precision of 0.

Also removed the empty schema batch so that simple queries return just one 
batch of data.

The result is that the broken code in the REST call should work for simple 
one-batch queries. Nothing I can do, however, will fix the fact that the schema 
will be repeated for every batch; fixing that will require changes to the REST 
code itself.


was (Author: paul.rogers):
Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to 
do: it avoids setting the precision if the precision is zero. This allows the 
(wrong) code in this feature to work. The incorrect code should change.

Also removed the empty schema batch so that simple queries return just one 
batch of data.

The result is that the broken code in the REST call should work for simple 
one-batch queries. Nothing I can do, however, will fix the fact that the schema 
will be repeated for every batch; fixing that will require changes to the REST 
code itself.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875630#comment-16875630
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506999568
 
 
   When running the full tests, the following failed in `java-exec`:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   When running this unit test in Eclipse, two tests failed: `testDropFunction` 
and `testSuccessfulRegistrationAfterSeveralRetryAttempts`. Then, 
`testConcurrentRemoteRegistryUpdateWithDuplicates` hung forever.
   
   I believe these tests (and one other that I can't recall) seem to fail about 
50% of the time on my builds.
   
   The workaround seems to be to rebuild all of Drill. That is, the rough 
pattern seems to be that this test will run once after a clean build, but will 
fail if run a second time or after a code change. Not sure if this is the exact 
pattern; something like this happens.
   
   The result is that it is hard to tell if my code broke something or if the 
tests are just flaky. I wonder, is there something we can do to stabilize these 
tests? All other tests run fine if I rerun them a second time on the same build 
or after I make a small code change.
   
   Anyway, after doing a full rebuild and retest, this commit does pass all 
unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875606#comment-16875606
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680
 
 
   Addressed the TestEmptyInputSql failure. The code now recognizes two cases:
   
   1. Empty results: the reader provided a schema, but had no rows. (This is 
the case that failed.)
   2. Null results: the reader provides neither rows nor schema. This is the 
case that was always being followed, even if we have a schema.
   
   Changed the query builder row set code to return an empty row set if the 
output contains only an empty batch and contains a schema. The code continues 
to return no row set if the result is null. (Oddly, Drill will return a batch 
with no rows and no schema if the reader returns no batches at all.)
   
   Will address other issues in separate commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875632#comment-16875632
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506989680
 
 
   Addressed the `TestEmptyInputSql` failure. The code now recognizes two cases:
   
   1. Empty results: the reader provided a schema, but had no rows. (This is 
the case that failed.)
   2. Null results: the reader provides neither rows nor schema. This is the 
case that was always being followed, even if we have a schema.
   
   Changed the query builder row set code to return an empty row set if the 
output contains only an empty batch and contains a schema. The code continues 
to return no row set if the result is null. (Oddly, Drill will return a batch 
with no rows and no schema if the reader returns no batches at all.)
   
   Will address other issues in separate commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-29 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872964#comment-16872964
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/29/19 6:13 PM:
-

[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10)}}. The scale field is never 
valid for {{VARCHAR}}.

The output of {{VARCHAR(0,0)}} is not a problem with the code that generated 
the schema. Instead, it is a problem with the way that the REST code attempts 
to generate a type name from the schema structures. To be more precise, the 
REST code incorrectly assumes that the {{isSet()}} methods are the correct way 
to check for a 0 value. This is an incorrect assumption.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

{code:java}
if (precision != 0) { schemaBuilder.setPrecision(precision); }
{code}

So, the code that uses the schema objects should do the following to determine 
if the value is other than the default: both ask if the value is set, and if 
so, ask if the value is non-zero. As it turns out, the unset value is 0, so 
there is actually no need to ask if the value is set in this case.

Taking a step back, the type formatting code should not even be in the REST 
API. The proper place for it is in {{Types}}. In fact, {{Types}} already has 
the desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

Note that {{getExtendedSqlTypeName()}} exposes the *SQL name* for types. The 
current REST implementation exposes the internal Drill name. That is, 
{{getExtendedSqlTypeName()}} will report, say, {{DOUBLE}} while the REST code 
will report {{Float8}}. This is probably a bug since the documentation explains 
the SQL types, not the internal types.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is. Or, if we want to display the 
width, add the required code to that function.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.


was (Author: paul.rogers):
[~cgivre], the problem here is that the code shown earlier is counting on a 
Protobuf implementation detail that is not actually a part of the Drill schema 
specification (to the degree there is such a specification.) For VarChar, a 
precision of 0 means that the user requested {{VARCHAR}}, while a precision of, 
say, 10 means the user requested {{VARCHAR(10}}. The scale is never valid for 
{{VARCHAR}}, it is an artifact of the incorrect way the above code was written.

The Protobuf issue is that, unlike a regular Java object, if we never actually 
write to the precision field, then the value is unset. If we write, even if we 
write 0, the value is set. We certainly don't want to litter our code with 
things like:

{code:java}
if (precision != 0) { schemaBuilder.setPrecision(precision); }
{code}

So, we should ask if the precision is set and non-zero.

In fact, the type formatting code should not even be in the REST API. The 
proper place for it is in {{Types}}. In fact, that class already has the 
desired function: {{getExtendedSqlTypeName()}}. However, this function only 
formats decimals; we need to add a case clause for VARCHAR.

That said, I actually have not seen any places in Drill where we set or use the 
VARCHAR width. So, no point in trying to format it. In this case, you can just 
use {{getExtendedSqlTypeName()}} directly as-is.

Please file a separate JIRA for the UDF issue. Please provide an attachment or 
link to a sample UDF. I'll see if I can track down that CSV-specific issue in 
case it relates to the EVF.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following 

[jira] [Commented] (DRILL-6225) Add support for boost 1.68 with openSSL 1.1.0/1.1.1 support

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875509#comment-16875509
 ] 

ASF GitHub Bot commented on DRILL-6225:
---

arina-ielchiieva commented on issue #1817: DRILL-6225: Add support for boost 
1.68 with openSSL 1.1.0/1.1.1 support
URL: https://github.com/apache/drill/pull/1817#issuecomment-506957237
 
 
   @debraj92 please squash the commits and address protobuf job failures - 
https://travis-ci.org/apache/drill/jobs/552040788.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for boost 1.68 with openSSL 1.1.0/1.1.1 support
> ---
>
> Key: DRILL-6225
> URL: https://issues.apache.org/jira/browse/DRILL-6225
> Project: Apache Drill
>  Issue Type: Task
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Debraj Ray
>Priority: Minor
>
> Boost 1.57 is not able to compile with openSSL 1.1 
> ([https://svn.boost.org/trac10/ticket/12238)] and adding 
> add_definitions(-DOPENSSL_API_COMPAT=0x1000L) does not work. 
>  
> In order to add support for openSSL 1.1, we would need to upgrade boost to 
> 1.62+. However, it looks like boost 1.62 bcp will segfault on asio component 
> when you attempt to shade the boost libraries 
> ([https://svn.boost.org/trac10/ticket/12357)]. So in that case, we should 
> upgrade to 1.64
> +.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7174) Expose complex to Json control in the Drill C++ Client

2019-06-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7174:
---

Assignee: Arina Ielchiieva

> Expose complex to Json control in the Drill C++ Client
> --
>
> Key: DRILL-7174
> URL: https://issues.apache.org/jira/browse/DRILL-7174
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Rob Wu
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.17.0
>
>
> Arjun Gupta will be supplying a patch for this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7174) Expose complex to Json control in the Drill C++ Client

2019-06-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7174:

Fix Version/s: 1.17.0

> Expose complex to Json control in the Drill C++ Client
> --
>
> Key: DRILL-7174
> URL: https://issues.apache.org/jira/browse/DRILL-7174
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Rob Wu
>Priority: Minor
> Fix For: 1.17.0
>
>
> Arjun Gupta will be supplying a patch for this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7174) Expose complex to Json control in the Drill C++ Client

2019-06-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7174:
---

Assignee: (was: Arina Ielchiieva)

> Expose complex to Json control in the Drill C++ Client
> --
>
> Key: DRILL-7174
> URL: https://issues.apache.org/jira/browse/DRILL-7174
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Rob Wu
>Priority: Minor
> Fix For: 1.17.0
>
>
> Arjun Gupta will be supplying a patch for this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7174) Expose complex to Json control in the Drill C++ Client

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875510#comment-16875510
 ] 

ASF GitHub Bot commented on DRILL-7174:
---

arina-ielchiieva commented on issue #1814: DRILL-7174: Expose complex to Json 
control in the Drill C++ Client
URL: https://github.com/apache/drill/pull/1814#issuecomment-506957298
 
 
   @vvysotskyi could you please review?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Expose complex to Json control in the Drill C++ Client
> --
>
> Key: DRILL-7174
> URL: https://issues.apache.org/jira/browse/DRILL-7174
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Rob Wu
>Priority: Minor
>
> Arjun Gupta will be supplying a patch for this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7174) Expose complex to Json control in the Drill C++ Client

2019-06-29 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7174:

Reviewer: Volodymyr Vysotskyi

> Expose complex to Json control in the Drill C++ Client
> --
>
> Key: DRILL-7174
> URL: https://issues.apache.org/jira/browse/DRILL-7174
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Rob Wu
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.17.0
>
>
> Arjun Gupta will be supplying a patch for this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7310) Move schema-related classes from exec module to be able to use them in metastore module

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875557#comment-16875557
 ] 

ASF GitHub Bot commented on DRILL-7310:
---

asfgit commented on pull request #1816: DRILL-7310: Move schema-related classes 
from exec module to be able to use them in metastore module
URL: https://github.com/apache/drill/pull/1816
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Move schema-related classes from exec module to be able to use them in 
> metastore module
> ---
>
> Key: DRILL-7310
> URL: https://issues.apache.org/jira/browse/DRILL-7310
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Currently, most of the schema related classes are placed in the {{exec}} 
> module, but some of them should be used in {{metastore}} module. 
> {{metastore}} module doesn't have a dependency onto exec one.
> The solution is to move these classes from {{exec}} into another module which 
> is used by {{metastore}}, so they will be accessible for {{metastore}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6711) Use jitpack repository for Drill Calcite project artifacts instead of repository.mapr.com

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875556#comment-16875556
 ] 

ASF GitHub Bot commented on DRILL-6711:
---

asfgit commented on pull request #1815: DRILL-6711: Use jitpack repository for 
Drill Calcite project artifacts instead of repository.mapr.com
URL: https://github.com/apache/drill/pull/1815
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use jitpack repository for Drill Calcite project artifacts instead of 
> repository.mapr.com
> -
>
> Key: DRILL-6711
> URL: https://issues.apache.org/jira/browse/DRILL-6711
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Simplify deployment of Drill Calcite project artifacts by using 
> [https://jitpack.io/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875562#comment-16875562
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on issue #1813: DRILL-7306: Disable schema-only 
batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-506972377
 
 
   @paul-rogers
   When running tests there are unit and functional test failures. Please run 
full unit tests suit locally before making the PR, Travis does not do that.
   UNIT TESTS
   ```
   [INFO] Running org.apache.drill.exec.TestEmptyInputSql
   05:53:50.840 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
B(1.0 MiB), h: 6.3 MiB(863.9 MiB), nh: 32 B(324.6 MiB)): 
testQueryEmptyCsv(org.apache.drill.exec.TestEmptyInputSql)
   java.lang.Exception: Expected and actual numbers of columns do not match.
at 
org.apache.drill.test.DrillTestWrapper.compareSchemaOnly(DrillTestWrapper.java:486)
 ~[test-classes/:1.17.0-SNAPSHOT]
at 
org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:163) 
~[test-classes/:1.17.0-SNAPSHOT]
at 
org.apache.drill.exec.TestEmptyInputSql.testQueryEmptyCsv(TestEmptyInputSql.java:222)
 ~[test-classes/:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161].
   ```
   FUNCTIONAL TESTS 
   ```  
 Data Verification Failures:
   
 Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.q
 SELECT cast(columns[0] as int) FROM `emptyFiles/empty_1.csv` UNION ALL 
SELECT col1 FROM notEmpty_csv_v
   
 Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/limit0/union_all/prq_union_all/data/emptyLHS_CSV.e
  Expected number of rows: 10
 Actual number of rows from Drill: 10
  Number of matching rows: 0
   Number of rows missing: 10
Number of rows unexpected: 10
   
 These rows are not expected (first 10):
 null
   
 These rows are missing (first 10):
 1 (1 occurence(s))
 2 (1 occurence(s))
 3 (1 occurence(s))
 4 (1 occurence(s))
 5 (1 occurence(s))
 6 (1 occurence(s))
 7 (1 occurence(s))
 8 (1 occurence(s))
 9 (1 occurence(s))
 10 (1 occurence(s))
   ```
   
   Please fix the failures and rebase on the latest master.
   Also when I was cherry-picking DRILL-7306 & DRILL-6951 there were conflicts.
   You can consider creating merge branch with commits for these Jiras and 
resolve the conflicts to ease merge process.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)