[jira] [Commented] (DRILL-5712) Update the pom files with dependency exclusions for commons-codec

2017-08-14 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126810#comment-16126810
 ] 

Jinfeng Ni commented on DRILL-5712:
---

Some of the unit testcases that failed:

Tests in error: 
  TestInfoSchemaOnHiveStorage>HiveTestBase.generateHive:34 » 
ExceptionInInitializer
  TestViewSupportOnHiveTables.generateHive:34 » ExceptionInInitializer
  
TestHiveStorage.readingFromStorageHandleBasedTable2:430->BaseTestQuery.testRunAndReturn:344
 » Rpc
  
TestHiveStorage.testIgnoreSkipHeaderFooterForSequencefile:520->BaseTestQuery.testRunAndReturn:344
 » Rpc
  TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter:443 »  
org.apache...
  
TestHiveStorage.testIgnoreSkipHeaderFooterForParquet:510->BaseTestQuery.testRunAndReturn:344
 » Rpc
  
TestHiveStorage.testNonAsciiStringLiterals:560->BaseTestQuery.testRunAndReturn:344
 » Rpc
  
TestHiveStorage.testIgnoreSkipHeaderFooterForRcfile:500->BaseTestQuery.testRunAndReturn:344
 » Rpc
  
TestHiveStorage.nativeReaderIsDisabledForAlteredPartitionedTable:408->PlanTestBase.getPlanInString:330-
...


> Update the pom files with dependency exclusions for commons-codec
> -
>
> Key: DRILL-5712
> URL: https://issues.apache.org/jira/browse/DRILL-5712
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>  Labels: ready-to-commit
>
> In java-exec, we are adding a dependency for commons-codec of version 1.10. 
> Other dependencies like hadoop-common, parquet-column etc are trying to 
> download different versions for common codec. Exclusions should be added for 
> common-codec in these dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5712) Update the pom files with dependency exclusions for commons-codec

2017-08-14 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126808#comment-16126808
 ] 

Jinfeng Ni commented on DRILL-5712:
---

I'm seeing both failures in unit test and functional regression test. Please 
resolve those failures before we can get the patch merged.  thx.


> Update the pom files with dependency exclusions for commons-codec
> -
>
> Key: DRILL-5712
> URL: https://issues.apache.org/jira/browse/DRILL-5712
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>  Labels: ready-to-commit
>
> In java-exec, we are adding a dependency for commons-codec of version 1.10. 
> Other dependencies like hadoop-common, parquet-column etc are trying to 
> download different versions for common codec. Exclusions should be added for 
> common-codec in these dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5601) Rollup of External Sort memory management fixes

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126806#comment-16126806
 ] 

ASF GitHub Bot commented on DRILL-5601:
---

Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/860
  
+1

LGTM.


> Rollup of External Sort memory management fixes
> ---
>
> Key: DRILL-5601
> URL: https://issues.apache.org/jira/browse/DRILL-5601
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very 
> difficult problem of managing memory within Drill in order for the external 
> sort to stay within a memory budget. In general, the fixes relate to better 
> estimating memory used by the three ways that Drill allocates vector memory 
> (see DRILL-5522) and to predicting the size of vectors that the sort will 
> create, to avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4211) Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.

2017-08-14 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126660#comment-16126660
 ] 

Timothy Farkas commented on DRILL-4211:
---

Column aliases were not being pushed down to JDBC storage for two reasons:

# The DrillProjectRel computeSelfCost function always returns zero cost. While 
the JDBCProjectRel always returned a non zero cost.
# The JdbcRules were only applied in the Physical Phase of planning, so the 
DrillJDBCProject rule was not able to convert a Projection node to a 
JdbcProject node in time for it to be converted to a JDBCPrel in the Physical 
phase of planning.

The solution:

# Fix the cost function for DrillProjectRel
# Apply the appropriate JDBC rules during the logical phase of planning.

Will open a PR shortly.

> Column aliases not pushed down to JDBC stores in some cases when Drill 
> expects aliased columns to be returned.
> --
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0, 1.11.0
> Environment: Postgres db storage
>Reporter: Robert Hamilton-Smith
>Assignee: Timothy Farkas
>  Labels: newbie
> Fix For: 1.12.0
>
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food&Dining|
> |id1|restaurants|food&Dining|
> |id2|Coffee Shops|food&Dining|
> |id2|Coffee Shops|food&Dining|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTA

[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-14 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126475#comment-16126475
 ] 

Padma Penumarthy commented on DRILL-5697:
-

yes, it is 6 times worse. Reason could be how we are creating the regex pattern 
string from sql like pattern string.

> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5722) Query with root and non-root fragments can potentially hang if Control Connection fails due to network issue.

2017-08-14 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5722:


 Summary: Query with root and non-root fragments can potentially 
hang if Control Connection fails due to network issue.
 Key: DRILL-5722
 URL: https://issues.apache.org/jira/browse/DRILL-5722
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sorabh Hamirwasia


Recently I found an issue (Thanks to [~knguyen] for creating the test scenario) 
related to Fragment Status reporting and would like some feedback on it. 

When a client submits a query to Foreman, then it is planned by Foreman and 
later fragments are scheduled to root and non-root nodes. Foreman creates a 
DriilbitStatusListener and FragmentStatusListener to know about the health of 
Drillbit node and a fragment respectively. The way root and non-root fragments 
are setup by Foreman are different: 
Root fragments are setup without any communication over control channel (since 
it is executed locally on Foreman)
Non-root fragments are setup by sending control message 
(REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending any 
such control message (like due to network hiccup's) during query setup then the 
query is failed and client is notified. 

Each fragment is executed on it's node with the help Fragment Executor which 
has an instance for FragmentStatusReporter. FragmentStatusReporter helps to 
update the status of a fragment to Foreman node over a control tunnel or 
connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root 
fragments. 

Based on above when root fragment is submitted for setup then it is done 
locally without any RPC communication whereas when status for that fragment is 
reported by fragment executor that happens over control connection by sending a 
RPC message. But for non-root fragment setup and status update both happens 
using RPC message over control connection.

Issue 2:
For complex query where we have root and non-root fragments, if the fragment 
setup was done fine and later during the query execution the control connection 
is lost. But as part of status update, fragments will try to create a new 
Control connection to foreman but let say they fails to do so and eventually 
got completed. Then in this case also Foreman will think that the fragment is 
still running and Query is not completed and can hang the query.

I don't see any timeout logic on the Foreman side for a query in execution 
which might be because we don't know how long a query will take for execution. 
Nor did I see any message from Foreman which keep's check on Fragment status in 
case if it didn't received any update from other fragments for a time interval. 
If not may be we should add the second option or look for other alternatives as 
well. It would be helpful to get some feedback on both the issues and proposed 
solutions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5721) Query with only root fragment and no non-root fragment hangs when Drillbit to Drillbit Control Connection has network issues

2017-08-14 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5721:


 Summary: Query with only root fragment and no non-root fragment 
hangs when Drillbit to Drillbit Control Connection has network issues
 Key: DRILL-5721
 URL: https://issues.apache.org/jira/browse/DRILL-5721
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sorabh Hamirwasia


Recently I found an issue (Thanks to [~knguyen] to create this scenario) 
related to Fragment Status reporting and would like some feedback on it. 

When a client submits a query to Foreman, then it is planned by Foreman and 
later fragments are scheduled to root and non-root nodes. Foreman creates a 
DriilbitStatusListener and FragmentStatusListener to know about the health of 
Drillbit node and a fragment respectively. The way root and non-root fragments 
are setup by Foreman are different: 
Root fragments are setup without any communication over control channel (since 
it is executed locally on Foreman)
Non-root fragments are setup by sending control message 
(REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending any 
such control message (like due to network hiccup's) during query setup then the 
query is failed and client is notified. 

Each fragment is executed on it's node with the help Fragment Executor which 
has an instance for FragmentStatusReporter. FragmentStatusReporter helps to 
update the status of a fragment to Foreman node over a control tunnel or 
connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root 
fragments. 

Based on above when root fragment is submitted for setup then it is done 
locally without any RPC communication whereas when status for that fragment is 
reported by fragment executor that happens over control connection by sending a 
RPC message. But for non-root fragment setup and status update both happens 
using RPC message over control connection.

*Issue 1:*
What was observed is if for a simple query which has only 1 root fragment 
running on Foreman node then setup will work fine. But as part of status update 
when the fragment tries to create a control connection and fails to establish 
that, then the query hangs. This is because the root fragment will complete 
execution but will fail to update Foreman about it and Foreman think that the 
query is running for ever. 

*Proposed Solution:*
For root fragment the setup of fragment is happening locally without RPC 
message, so we can do the same for status update of root fragments. This will 
avoid RPC communication for status update of fragments running locally on the 
foreman and hence will resolve issue 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (DRILL-5685) Provide a way to set common environment variable between sqlline and Drillbit differently.

2017-08-14 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reopened DRILL-5685:
---

> Provide a way to set common environment variable between sqlline and Drillbit 
> differently.
> --
>
> Key: DRILL-5685
> URL: https://issues.apache.org/jira/browse/DRILL-5685
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Drill has distrib-env.sh which is used to set any distribution specific 
> environment consumed by both sqlline and Drillbit. These environment 
> variables can be overridden by drill-env.sh. But there is no clean way to 
> know if these scripts were sourced for Drillbit or for sqlline, currently all 
> the variables will be set for both the scripts.
> With this JIRA we will introduce a separated environment variable 
> _DRILLBIT_CONTEXT_ which will only be set inside drillbit.sh. Based on this 
> variable any script called later in the pipeline can make decision's to 
> set/unset an environment variable or to set the common environment variable 
> differently for sqlline and Drillbit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching

2017-08-14 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126158#comment-16126158
 ] 

Kunal Khatua commented on DRILL-5697:
-

It's odd that the RE2 library would perform worse than the Java Regex Library. 
Is it significantly worse? Perhaps RE2 makes up for more complex patterns, 
which these example would probably not cover.

> Improve performance of filter operator for pattern matching
> ---
>
> Key: DRILL-5697
> URL: https://issues.apache.org/jira/browse/DRILL-5697
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for 
> pattern matching. However, for cases like %abc (ends with abc), abc% (starts 
> with abc), %abc% (contains abc), it is observed that implementing these cases 
> with simple code instead of using regex library provides good performance 
> boost (4-6x). Idea is to use special case code for simple, common cases and 
> fall back to Java regex library for complicated ones. That will provide good 
> performance benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5477) String functions (lower, upper, initcap) should work for UTF-8

2017-08-14 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5477:
-
Labels: doc-impacting  (was: )

> String functions (lower, upper, initcap) should work for UTF-8
> --
>
> Key: DRILL-5477
> URL: https://issues.apache.org/jira/browse/DRILL-5477
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>  Labels: doc-impacting
>
> Drill string functions lower / upper / initcap work only for ASCII, but not 
> for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding 
> to convert to Unicode characters. Without that encoding, these functions 
> won't work for Cyrillic, Greek or any other character set with upper/lower 
> distinctions.
> Currently, when user applies these functions for UTF-8, Drill returns the 
> same value as was given.
> Example:
> {noformat}
> select upper('привет') from (values(1)) -> привет
> {noformat}
> There is disabled unit test in 
> https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
>  which should be enabled once issue is fixed.
> Please note, by default Calcite does not allow to use UTF-8. Update system 
> property *saffron.default.charset* to *UTF-16LE* if you encounter the 
> following error:
> {noformat}
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5719) Join query on a non existing column in a json file runs longer than usual

2017-08-14 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5719:
-
Description: 
1) Join query on two json files
Column exists
{code}
select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.p_partkey = t1.ps_partkey;
{code}
Columns doesnt exist (The part_json file has no key by name partkey)
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
{code}

part.json & partsupp.json - tpch sf1 dataset

Time taken when-
1) column exists in the file - 20secs
2) column doesnt exist in the file - 15mins

  was:
1) Join query on two json files
Column exists
{code}
select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.p_partkey = t1.ps_partkey;
{code}
Columns doesnt exist
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
{code}
The part_json file has no key by name partkey.

part.json & partsupp.json - tpch sf1 dataset

Time taken when-
1) column exists in the file - 20secs
2) column doesnt exist in the file - 15mins


> Join query on a non existing column in a json file runs longer than usual
> -
>
> Key: DRILL-5719
> URL: https://issues.apache.org/jira/browse/DRILL-5719
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> 1) Join query on two json files
> Column exists
> {code}
> select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT 
> JOIN dfs.`testData/partsupp.json` as t1 ON t.p_partkey = t1.ps_partkey;
> {code}
> Columns doesnt exist (The part_json file has no key by name partkey)
> {code}
> select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
> dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
> {code}
> part.json & partsupp.json - tpch sf1 dataset
> Time taken when-
> 1) column exists in the file - 20secs
> 2) column doesnt exist in the file - 15mins



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5719) Join query on a non existing column in a json file runs longer than usual

2017-08-14 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5719:
-
Description: 
1) Join query on two json files
Column exists
{code}
select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.p_partkey = t1.ps_partkey;
{code}
Columns doesnt exist
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
{code}
The part_json file has no key by name partkey.

part.json & partsupp.json - tpch sf1 dataset

Time taken when-
1) column exists in the file - 20secs
2) column doesnt exist in the file - 15mins

  was:
1) Join query on two json files
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
{code}
The part_json file has no key by name partkey.

Attached part.json & partsupp.json files


> Join query on a non existing column in a json file runs longer than usual
> -
>
> Key: DRILL-5719
> URL: https://issues.apache.org/jira/browse/DRILL-5719
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> 1) Join query on two json files
> Column exists
> {code}
> select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT 
> JOIN dfs.`testData/partsupp.json` as t1 ON t.p_partkey = t1.ps_partkey;
> {code}
> Columns doesnt exist
> {code}
> select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
> dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
> {code}
> The part_json file has no key by name partkey.
> part.json & partsupp.json - tpch sf1 dataset
> Time taken when-
> 1) column exists in the file - 20secs
> 2) column doesnt exist in the file - 15mins



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5719) Join query on a non existing column in a json file runs longer than usual

2017-08-14 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5719:
-
Summary: Join query on a non existing column in a json file runs longer 
than usual  (was: Join query on a non existing column in a json file runs 
infinitely)

> Join query on a non existing column in a json file runs longer than usual
> -
>
> Key: DRILL-5719
> URL: https://issues.apache.org/jira/browse/DRILL-5719
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> 1) Join query on two json files
> {code}
> select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN 
> dfs.`testData/partsupp.json` as t1 ON t.partkey = t1.ps_partkey;
> {code}
> The part_json file has no key by name partkey.
> Attached part.json & partsupp.json files



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125936#comment-16125936
 ] 

ASF GitHub Bot commented on DRILL-3640:
---

Github user laurentgo commented on the issue:

https://github.com/apache/drill/pull/858
  
I think you can actually do 2 and 3 with the same approach, depending on 
how you compute the remaining time.


> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
> Fix For: 1.12.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) date time test cases is not Local independent

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125578#comment-16125578
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/904#discussion_r132930206
  
--- Diff: contrib/format-maprdb/pom.xml ---
@@ -75,6 +75,12 @@
 
   com.mapr.fs
   mapr-hbase
+  
+
+  commons-codec
+  commons-codec
+
+  
--- End diff --

I think it would be better to create another Jira and move these changes in 
the pom files there since it does not connect with the issue described in 
DRILL-5717.


> date time test cases is not Local independent
> -
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5717) date time test cases is not Local independent

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125579#comment-16125579
 ] 

ASF GitHub Bot commented on DRILL-5717:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/904#discussion_r132932858
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/DateIntervalFunctionTemplates/DateToCharFunctions.java
 ---
@@ -65,7 +65,7 @@ public void setup() {
 byte[] buf = new byte[right.end - right.start];
 right.buffer.getBytes(right.start, buf, 0, right.end - 
right.start);
 String input = new String(buf, 
com.google.common.base.Charsets.UTF_8);
-format = org.joda.time.format.DateTimeFormat.forPattern(input);
+format = 
org.joda.time.format.DateTimeFormat.forPattern(input).withLocale(java.util.Locale.ENGLISH);
--- End diff --

I don't think that it is the right solution. A table may contain a field 
with date strings which were created with the non-ENGLISH locale so for this 
case, the query will fail. 

We need to set locale to ENGLISH only for required unit tests.


> date time test cases is not Local independent
> -
>
> Key: DRILL-5717
> URL: https://issues.apache.org/jira/browse/DRILL-5717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>
> Some date time test cases like  JodaDateValidatorTest  is not Local 
> independent .This will cause other Local's users's test phase to fail. We 
> should let these test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4264) Dots in identifier are not escaped correctly

2017-08-14 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125499#comment-16125499
 ] 

Volodymyr Vysotskyi commented on DRILL-4264:


[~mandoskippy], the goal of this Jira is to fix the issue which you have 
mentioned.
With the fix for this Jira query 
{code:sql}
select * from `test.json`;
{code}
where *test.json* is the file from Jira description (it also has dots in the 
field names) will return correct result:
{noformat}
+--+--+
|  0.0.1   |  0.1.2 
  |
+--+--+
| {"version":"0.0.1","date_created":"2014-03-15"}  | 
{"version":"0.1.2","date_created":"2014-05-21"}  |
+--+--+
{noformat}

> Dots in identifier are not escaped correctly
> 
>
> Key: DRILL-4264
> URL: https://issues.apache.org/jira/browse/DRILL-4264
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Alex
>Assignee: Volodymyr Vysotskyi
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> If you have some json data like this...
> {code:javascript}
> {
>   "0.0.1":{
> "version":"0.0.1",
> "date_created":"2014-03-15"
>   },
>   "0.1.2":{
> "version":"0.1.2",
> "date_created":"2014-05-21"
>   }
> }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)