date:20170920

[jira] [Created] (DRILL-5807) ambiguous error

2017-09-20 Thread XiaHang (JIRA)

XiaHang created DRILL-5807:
--

 Summary: ambiguous error
 Key: DRILL-5807
 URL: https://issues.apache.org/jira/browse/DRILL-5807
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.11.0
 Environment: Linux
Reporter: XiaHang
Priority: Critical


if the final plan like below , JdbcFilter is below a JdbcJoin and above another 
JdbcJoin . 

JdbcProject(order_id=[$0], mord_id=[$6], item_id=[$2], div_pay_amt=[$5], 
item_quantity=[$4], slr_id=[$11]): rowcount = 5625.0, cumulative cost = 
{12540.0 rows, 29763.0 cpu, 0.0 io}, id = 327
JdbcJoin(condition=[=($3, $11)], joinType=[left]): rowcount = 5625.0, 
cumulative cost = {8040.0 rows, 2763.0 cpu, 0.0 io}, id = 325
  JdbcFilter(condition=[OR(AND(OR(IS NOT NULL($7), >($5, 0)), =($1, 2), 
OR(AND(=($10, '箱包皮具/热销女包/男包'), >(/($5, $4), 1000)), AND(OR(=($10, '家装主材'), 
=($10, '大家电')), >(/($5, $4), 1000)), AND(OR(=($10, '珠宝/钻石/翡翠/黄金'), =($10, 
'饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(>(/($5, $4), 500), <>($10, 
'箱包皮具/热销女包/男包'), <>($10, '家装主材'), <>($10, '大家电'), <>($10, '珠宝/钻石/翡翠/黄金'), 
<>($10, '饰品/流行首饰/时尚饰品新'))), <>($10, '成人用品/情趣用品'), <>($10, '鲜花速递/花卉仿真/绿植园艺'), 
<>($10, '水产肉类/新鲜蔬果/熟食')), AND(<=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), 
EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(14, 24), 60), 60)), 
OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '家装主材'), =($10, '大家电'), =($10, 
'珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(OR(=($10, 
'男装'), =($10, '女装/女士精品'), =($10, '办公设备/耗材/相关服务')), >(/($5, $4), 1000)), 
AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, $4), 1500))), IS NOT NULL($8)), 
AND(>=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), 
CAST($8):TIMESTAMP(0))), *(*(*(15, 24), 60), 60)), <=(-(EXTRACT(FLAG(EPOCH), 
CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(60, 
24), 60), 60)), OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '珠宝/钻石/翡翠/黄金'), =($10, 
'饰品/流行首饰/时尚饰品新')), >(/($5, $4), 5000)), AND(OR(=($10, '男装'), =($10, 
'女装/女士精品')), >(/($5, $4), 3000)), AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, 
$4), 2500)), AND(=($10, '办公设备/耗材/相关服务'), >(/($5, $4), 2000))), IS NOT 
NULL($8)))]): rowcount = 375.0, cumulative cost = {2235.0 rows, 2582.0 cpu, 0.0 
io}, id = 320
JdbcJoin(condition=[=($2, $9)], joinType=[left]): rowcount = 1500.0, 
cumulative cost = {1860.0 rows, 1082.0 cpu, 0.0 io}, id = 318
  JdbcProject(order_id=[$0], pay_status=[$2], item_id=[$3], 
seller_id=[$5], item_quantity=[$7], div_pay_amt=[$20], mord_id=[$1], 
pay_time=[$19], succ_time=[$52]): rowcount = 100.0, cumulative cost = {180.0 
rows, 821.0 cpu, 0.0 io}, id = 313
JdbcTableScan(table=[[public, dws_tb_crm_u2_ord_base_df]]): 
rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 29
  JdbcProject(item_id=[$0], cate_level1_name=[$47]): rowcount = 100.0, 
cumulative cost = {180.0 rows, 261.0 cpu, 0.0 io}, id = 316
JdbcTableScan(table=[[public, dws_tb_crm_u2_itm_base_df]]): 
rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 46
  JdbcProject(slr_id=[$3]): rowcount = 100.0, cumulative cost = {180.0 
rows, 181.0 cpu, 0.0 io}, id = 323
JdbcTableScan(table=[[public, dws_tb_crm_u2_slr_base]]): rowcount = 
100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 68

the sql is converted to 
SELECT "t1"."order_id", "t1"."mord_id", "t1"."item_id", "t1"."div_pay_amt", 
"t1"."item_quantity", "t2"."slr_id"
FROM (SELECT *
FROM (SELECT "order_id", "pay_status", "item_id", "seller_id", "item_quantity", 
"div_pay_amt", "mord_id", "pay_time", "succ_time"
FROM "dws_tb_crm_u2_ord_base_df") AS "t"
LEFT JOIN (SELECT "item_id", "cate_level1_name"
FROM "dws_tb_crm_u2_itm_base_df") AS "t0" ON "t"."item_id" = "t0"."item_id"
WHERE ("t"."pay_time" IS NOT NULL OR "t"."div_pay_amt" > 0) AND 
"t"."pay_status" = 2 AND ("t0"."cate_level1_name" = '箱包皮具/热销女包/男包' AND 
"t"."div_pay_amt" / "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = 
'家装主材' OR "t0"."cate_level1_name" = '大家电') AND "t"."div_pay_amt" / 
"t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = '珠宝/钻石/翡翠/黄金' OR 
"t0"."cate_level1_name" = '饰品/流行首饰/时尚饰品新') AND "t"."div_pay_amt" / 
"t"."item_quantity" > 2000 OR "t"."div_pay_amt" / "t"."item_quantity" > 500 AND 
"t0"."cate_level1_name" <> '箱包皮具/热销女包/男包' AND "t0"."cate_level1_name" <> '家装主材' 
AND "t0"."cate_level1_name" <> '大家电' AND "t0"."cate_level1_name" <> 
'珠宝/钻石/翡翠/黄金' AND "t0"."cate_level1_name" <> '饰品/流行首饰/时尚饰品新') AND 
"t0"."cate_level1_name" <> '成人用品/情趣用品' AND "t0"."cate_level1_name" <> 
'鲜花速递/花卉仿真/绿植园艺' AND "t0"."cate_level1_name" <> '水产肉类/新鲜蔬果/熟食' OR EXTRACT(EPOCH 
FROM CURRENT_TIMESTAMP) - EXTRACT(EPOCH FROM CAST("t"."succ_time" AS 
TIMESTAMP(0))) <= 14 * 24 * 60 * 60 AND (("t0"."cate_level1_name" = 
'箱包皮具/热销女包/男包' OR "t0"."cate_level1_name" = '家装主材' OR

[jira] [Commented] (DRILL-5781) Fix unit test failures to use tests config even if default config is available

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173617#comment-16173617
 ] 

ASF GitHub Bot commented on DRILL-5781:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r140048587
  
--- Diff: contrib/storage-hbase/src/test/resources/hbase-site.xml ---
@@ -66,15 +66,13 @@
 Default is 10.
 
   
-
> Key: DRILL-5781
> URL: https://issues.apache.org/jira/browse/DRILL-5781
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
> Fix For: 1.12.0
>
>
> Unit tests fail when they are run with the mapr profile.
> Tests failures, connected with the Zookeeper configuration that differs from 
> expected:
> {noformat}
> DrillClientTest>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> Coul...
>   TestZookeeperClient.testPutWithMatchingVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testStartingClientEnablesCacheAndEnsuresRootNodeExists 
> » IO
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathThrowsDrillRuntimeException » IO Could not 
> conf...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathFalseWithVersion » IO Could not configure 
> serve...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testPutAndGetWorksAntagonistacally » IO Could not 
> configure...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testGetWithVersion » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testStoreRegistersDispatcherAndStartsItsClient » IO 
> Could n...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testPutWithNonMatchingVersion » IO Could not configure 
> ser...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testGetWithEventualConsistencyHitsCache » IO Could not 
> con...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenPresent » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathTrueWithVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutAndGetWorks » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenAbsent » IO Could not configure 
> server ...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathWithEventualConsistencyHitsCache » IO Could 
> not...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testCreate » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testDelete » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testEntriesReturnsRelativePaths » IO Could not 
> configure s...
>   TestZookeeperClient.tearDown:86 NullPointer
> TestPStoreProviders>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> ...
>   TestPauseInjection.pauseOnSpecificBit:151 » Runtime java.io.IOException: 
> Could...
>   TestExceptionInjection.injectionOnSpecificBit:217 » Runtime 
> java.io.IOExceptio...
> HBaseTestsSuite.initCluster:110 » IO No JAAS configuration section named 
> 'Serv...
> {noformat}
> Test failures, connected with Hadoop configuration that differs from expected:
> {noformat}
> TestInboundImpersonation.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationMetadata.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationDisabledWithMiniDFS.setup:37->BaseTestImpersonation.startMiniDfsCluster:106
>  » Runtime
>   
> TestImpersonationQueries.setup:46->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
> TestHiveStorage>HiveTestBase.generateHive:34 » Runtime 
> java.lang.RuntimeExcept...
>   TestInfoSchemaOnHiveStorage>HiveTestBase.generateHive:34 » Runtime 
> java.lang.R...
>   TestInbuiltHiveUDFs>HiveTestBase.generateHive:35 » ExecutionSetup Failure 
> sett...
>   TestSampleHiveUDFs>HiveTestBase.generateHive:35 » ExecutionSetup Failure 
> setti...
>   
> TestStorageBasedHiveAuthorization.setup:109->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
>

[jira] [Commented] (DRILL-5781) Fix unit test failures to use tests config even if default config is available

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173618#comment-16173618
 ] 

ASF GitHub Bot commented on DRILL-5781:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r139273842
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java 
---
@@ -100,6 +101,14 @@ public void run() {
 return dir.getAbsolutePath() + File.separator + dirName;
   }
 
+  /**
+   * Sets zookeeper server and client SASL test config properties.
+   */
+  public static void setZookeeperSaslTestConfigProps() {
--- End diff --

Maybe it's possible to create separate test zk util class with this method 
and also setup for jaas property (so jaas config is not repeated twice in the 
code)  and keep it in the same package where we test zk?


> Fix unit test failures to use tests config even if default config is available
> --
>
> Key: DRILL-5781
> URL: https://issues.apache.org/jira/browse/DRILL-5781
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
> Fix For: 1.12.0
>
>
> Unit tests fail when they are run with the mapr profile.
> Tests failures, connected with the Zookeeper configuration that differs from 
> expected:
> {noformat}
> DrillClientTest>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> Coul...
>   TestZookeeperClient.testPutWithMatchingVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testStartingClientEnablesCacheAndEnsuresRootNodeExists 
> » IO
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathThrowsDrillRuntimeException » IO Could not 
> conf...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathFalseWithVersion » IO Could not configure 
> serve...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testPutAndGetWorksAntagonistacally » IO Could not 
> configure...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testGetWithVersion » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testStoreRegistersDispatcherAndStartsItsClient » IO 
> Could n...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testPutWithNonMatchingVersion » IO Could not configure 
> ser...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testGetWithEventualConsistencyHitsCache » IO Could not 
> con...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenPresent » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathTrueWithVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutAndGetWorks » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenAbsent » IO Could not configure 
> server ...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathWithEventualConsistencyHitsCache » IO Could 
> not...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testCreate » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testDelete » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testEntriesReturnsRelativePaths » IO Could not 
> configure s...
>   TestZookeeperClient.tearDown:86 NullPointer
> TestPStoreProviders>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> ...
>   TestPauseInjection.pauseOnSpecificBit:151 » Runtime java.io.IOException: 
> Could...
>   TestExceptionInjection.injectionOnSpecificBit:217 » Runtime 
> java.io.IOExceptio...
> HBaseTestsSuite.initCluster:110 » IO No JAAS configuration section named 
> 'Serv...
> {noformat}
> Test failures, connected with Hadoop configuration that differs from expected:
> {noformat}
> TestInboundImpersonation.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationMetadata.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationDisabledWithMiniDFS.setup:37->BaseTestImpersonation.startMiniDfsCluster:106
>  » Runtime
>   
>

[jira] [Commented] (DRILL-5781) Fix unit test failures to use tests config even if default config is available

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173619#comment-16173619
 ] 

ASF GitHub Bot commented on DRILL-5781:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r140048784
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/PathUtils.java ---
@@ -70,4 +72,14 @@ public static final String normalize(final String path) {
 return builder.toString();
   }
 
+  /**
+   * Creates and returns path with the protocol at the beginning from 
specified {@code url}.
+   */
--- End diff --

Can you please add java doc with @param and @return?


> Fix unit test failures to use tests config even if default config is available
> --
>
> Key: DRILL-5781
> URL: https://issues.apache.org/jira/browse/DRILL-5781
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
> Fix For: 1.12.0
>
>
> Unit tests fail when they are run with the mapr profile.
> Tests failures, connected with the Zookeeper configuration that differs from 
> expected:
> {noformat}
> DrillClientTest>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> Coul...
>   TestZookeeperClient.testPutWithMatchingVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testStartingClientEnablesCacheAndEnsuresRootNodeExists 
> » IO
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathThrowsDrillRuntimeException » IO Could not 
> conf...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathFalseWithVersion » IO Could not configure 
> serve...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testPutAndGetWorksAntagonistacally » IO Could not 
> configure...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testGetWithVersion » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testStoreRegistersDispatcherAndStartsItsClient » IO 
> Could n...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testPutWithNonMatchingVersion » IO Could not configure 
> ser...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testGetWithEventualConsistencyHitsCache » IO Could not 
> con...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenPresent » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathTrueWithVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutAndGetWorks » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenAbsent » IO Could not configure 
> server ...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathWithEventualConsistencyHitsCache » IO Could 
> not...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testCreate » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testDelete » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testEntriesReturnsRelativePaths » IO Could not 
> configure s...
>   TestZookeeperClient.tearDown:86 NullPointer
> TestPStoreProviders>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> ...
>   TestPauseInjection.pauseOnSpecificBit:151 » Runtime java.io.IOException: 
> Could...
>   TestExceptionInjection.injectionOnSpecificBit:217 » Runtime 
> java.io.IOExceptio...
> HBaseTestsSuite.initCluster:110 » IO No JAAS configuration section named 
> 'Serv...
> {noformat}
> Test failures, connected with Hadoop configuration that differs from expected:
> {noformat}
> TestInboundImpersonation.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationMetadata.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationDisabledWithMiniDFS.setup:37->BaseTestImpersonation.startMiniDfsCluster:106
>  » Runtime
>   
> TestImpersonationQueries.setup:46->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
> TestHiveStorage>HiveTestBase.generateHive:34 » Runtime 
> java.lang.RuntimeExcept...
>

[jira] [Assigned] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Prasad Nagaraj Subramanya (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya reassigned DRILL-5745:


Assignee: Prasad Nagaraj Subramanya

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5166) Select with options returns NPE

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5166:
---

Assignee: Arina Ielchiieva

> Select with options returns NPE
> ---
>
> Key: DRILL-5166
> URL: https://issues.apache.org/jira/browse/DRILL-5166
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> When querying two csv files:
> First file (2 records):
> {noformat}
> key_header, value_header
> key_1,value_1
> {noformat}
> Second file (50 records):
> {noformat}
> key_header, value_header
> key_1,value_1
> ...
> key_49,value_49
> {noformat}
> Select with options returns NPE:
> {noformat}
> select * from table(dfs.root.`/home/arina/files/ver/*.csv`(type => 
> 'text',extractHeader => true, fieldDelimiter => ',')) limit 10;
> {noformat}
> Querying without options works file:
> {noformat}
> select  * from dfs.root.`/home/arina/files/ver/*.csv` limit 10;
> {noformat}
> Error:
> {noformat}
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: NullPointerException
> Fragment 1:0
> [Error Id: b789f5f8-f090-4097-b7ff-9f4efd3d01e8 on localhost:31013]
>   (com.fasterxml.jackson.databind.JsonMappingException) Instantiation of 
> [simple type, class org.apache.drill.exec.store.dfs.easy.EasySubScan] value 
> failed (java.lang.NullPointerException): null
>  at [Source: {
>   "pop" : "single-sender",
>   "@id" : 0,
>   "receiver-major-fragment" : 0,
>   "receiver-minor-fragment" : 0,
>   "child" : {
> "pop" : "selection-vector-remover",
> "@id" : 1,
> "child" : {
>   "pop" : "limit",
>   "@id" : 2,
>   "child" : {
> "pop" : "fs-sub-scan",
> "@id" : 3,
> "userName" : "arina",
> "files" : [ {
>   "start" : 0,
>   "length" : 11777804,
>   "path" : "file:/home/arina/files/ver/key_value_50.csv"
> } ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : "/",
>   "writable" : false,
>   "defaultInputFormat" : null
> },
> "tmp" : {
>   "location" : "/tmp",
>   "writable" : false,
>   "defaultInputFormat" : null
> }
>   },
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "httpd" : {
>   "type" : "httpd",
>   "logFormat" : "%h %t \"%r\" %>s %b \"%{Referer}i\"",
>   "timestampFormat" : null
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> },
> "sequencefile" : {
>   "type" : "sequencefile",
>   "extensions" : [ "seq" ]
> },
> "csvh" : {
>   "type" : "text",
>   "extensions" : [ "csvh" ],
>   "extractHeader" : true,
>   "delimiter" : ","
> }
>   }
> },
> "format" : {
>   "type" : "named",
>   "name" : "text"
> },
> "columns" : [ "`*`" ],
> "selectionRoot" : "file:/home/arina/files/ver",
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 0.0
>   },
>   "first" : 0,
>   "last" : 10,
>   "initialAllocation" : 100,
>   "maxAllocation" : 100,
>   "cost" : 10.0
> },
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 10.0
>   },
>   "destination" : "Cglsb2NhbGhvc3QQpfIBGKbyASCn8gEyDzEuMTAuMC1TTkFQU0hPVA==",
>   "initialAllocation" : 100,
>   "maxAllocation" : 100,
>   "cost" : 10.0
> }; line: 90, column: 7] (through reference chain: 
> org.apache.drill.exec.physical.config.SingleSender["child"]->org.apache.drill.exec.physical.config.SelectionVectorRemover["child"]->org.apache.drill.exec.physical.config.Limit["child"])
> com.fasterxml.jackson.databind.JsonMappingException.from():223
>

[jira] [Created] (DRILL-5808) Reduce memory allocator strictness for "managed" operators

2017-09-20 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5808:
--

 Summary: Reduce memory allocator strictness for "managed" operators
 Key: DRILL-5808
 URL: https://issues.apache.org/jira/browse/DRILL-5808
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.11.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.12.0


Drill 1.11 and 1.12 introduce new "managed" versions of the sort and hash agg 
that enforce memory limits, spilling to disk when necessary.

Drill's internal memory system is very "lumpy" and unpredictable. The operators 
have no control over the incoming batch size; an overly large batch can cause 
the operator to exceed its memory limit before it has a chance to do any work.

Vector allocations grow in power-of-two sizes. Adding a single record can 
double the memory allocated to a vector.

Drill has no metadata, so operators cannot predict the size of VarChar columns 
nor the cardinality of arrays. The "Record Batch Sizer" tries to extract this 
information on each batch, but it works with averages, and specific column 
patterns can still throw off the memory calculations. (For example, having a 
series of very wide columns for A-M and very narrow columns for N-Z will cause 
a moderate average. But, once sorted, the A-M rows, and batches, will be much 
larger than expected, causing out-of-memory errors.)

At present, if an operator is wrong in its memory usage by a single byte, the 
entire query is killed. That is, the user pays the death penalty (of queries) 
for poor design decisions within Drill. This leads to a less-than-optimal user 
experience.

The proposal here is to make the memory allocator less strict for "managed" 
operators.

First, we recognize that the managed operators do attempt to control memory 
and, if designed well, will, on average hit their targets.

Second, we recognize that, due to the lumpiness issues above, any single 
operator may exceed, or be under, the configured maximum memory.

Given this, the proposal here is:

1. An operator identifies itself as managed to the memory allocator.
2. In managed mode, the allocator has soft limits. It emits a warning to the 
log when the limit is exceeded.
3. For safety, in managed mode, the allocator enforces a hard limit larger than 
the configured limit.

The enforcement limit might be:

* For memory sizes < 100MB, up to 2x the configured limit.
* For larger memory sizes, no more than 100MB over the configured limit.

The exact numbers can be made configurable.

Now, during testing, scripts should look for over-memory warnings. Each should 
be fixed as we fix OOM issues today. But, during production, user queries are 
far less likely to fail due to any remaining corner cases that throw off the 
memory calculations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-09-20 Thread Roman Kulyk (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173529#comment-16173529
 ] 

Roman Kulyk commented on DRILL-5083:


[~khfaraaz],
Sorry for the late answer I missed your message. 

To reproduce this issue I used the next env:
1 node cluster (16G RAM, 2 cores)

I used tpcds_sf100 2 query:

{code:sql}
WITH wscs AS (SELECT sold_date_sk, sales_price FROM (SELECT ws_sold_date_sk 
sold_date_sk, ws_ext_sales_price sales_price FROM web_sales) UNION ALL (SELECT 
cs_sold_date_sk sold_date_sk, cs_ext_sales_price sales_price FROM 
catalog_sales)), wswscs AS (SELECT d_week_seq, Sum(CASE WHEN ( d_day_name = 
'Sunday' ) THEN sales_price ELSE NULL END) sun_sales, Sum(CASE WHEN ( 
d_day_name = 'Monday' ) THEN sales_price ELSE NULL END) mon_sales, Sum(CASE 
WHEN ( d_day_name = 'Tuesday' ) THEN sales_price ELSE NULL END) tue_sales, 
Sum(CASE WHEN ( d_day_name = 'Wednesday' ) THEN sales_price ELSE NULL END) 
wed_sales, Sum(CASE WHEN ( d_day_name = 'Thursday' ) THEN sales_price ELSE NULL 
END) thu_sales, Sum(CASE WHEN ( d_day_name = 'Friday' ) THEN sales_price ELSE 
NULL END) fri_sales, Sum(CASE WHEN ( d_day_name = 'Saturday' ) THEN sales_price 
ELSE NULL END) sat_sales FROM wscs, date_dim WHERE d_date_sk = sold_date_sk 
GROUP BY d_week_seq) SELECT d_week_seq1, Round(sun_sales1 / sun_sales2, 2), 
Round(mon_sales1 / mon_sales2, 2), Round(tue_sales1 / tue_sales2, 2), 
Round(wed_sales1 / wed_sales2, 2), Round(thu_sales1 / thu_sales2, 2), 
Round(fri_sales1 / fri_sales2, 2), Round(sat_sales1 / sat_sales2, 2) FROM 
(SELECT wswscs.d_week_seq d_week_seq1, sun_sales sun_sales1, mon_sales 
mon_sales1, tue_sales tue_sales1, wed_sales wed_sales1, thu_sales thu_sales1, 
fri_sales fri_sales1, sat_sales sat_sales1 FROM wswscs, date_dim WHERE 
date_dim.d_week_seq = wswscs.d_week_seq AND d_year = 1998) y, (SELECT 
wswscs.d_week_seq d_week_seq2, sun_sales sun_sales2, mon_sales mon_sales2, 
tue_sales tue_sales2, wed_sales wed_sales2, thu_sales thu_sales2, fri_sales 
fri_sales2, sat_sales sat_sales2 FROM wswscs, date_dim WHERE 
date_dim.d_week_seq = wswscs.d_week_seq AND d_year = 1998 + 1) z WHERE 
d_week_seq1 = d_week_seq2 - 53 ORDER BY d_week_seq1 limit 1;
{code}

on views which were created from the next dataset (all files were in parquet 
format):
{code:xml}
hadoop fs -du -h /drill/testdata/
1.4 G/drill/testdata/catalog_sales
22.4 M   /drill/testdata/date_dim
734.0 M  /drill/testdata/web_sales
{code}


To get a reproduce you should change the next properties before you run the 
query:

{code:sql}
alter session set `planner.enable_hashjoin` = false;
alter session set `planner.enable_hashagg` = false;
alter session set `planner.enable_mergejoin` = true;
alter session set `planner.memory.max_query_memory_per_node` = 1048576;
{code}

After that, you should run the query (tpcds_sf100 2) and interrupt it with 
"Ctrl + c" at the right moment. There are 2 ways to find a moment to get the 
reproduce:

1) In my case, if I will not interrupt the query, it will fail after 1 min 40 
sec. To get hang in CANCELLATION_REQUESTED state I need to make "Ctrl + c" 
after ~1 min 30 sec. So I think you can scale time for the situation on your 
cluster.

2) You can open 2nd terminal window with "tail -f 
/opt/mapr/drill/log/drillbit.log" and you should interrupt the query before the 
first fragment will change state to FINISHED. 
To make this you should run the query for the first time. The query should fail 
and you will be able to see a number of operators which were running before 
some fragment finishes (as an example you can see a required number of 
operators in the red circles in Reproduce5083.jpg attachment). When you will 
run the query in the second time, just wait for a required number in the 2nd 
terminal and interrupt query with "Ctrl + c".

You can reproduce this issue in Drill before 368bc38b1 and you can verify it on 
the latest apache master.


> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman Kulyk
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: DrillOperatorErrorHandlingRedesign.pdf, Reproduce5083.jpg
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
>

[jira] [Updated] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5745:

Reviewer: Arina Ielchiieva

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5745:

Labels: ready-to-commit  (was: )

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5745:
---

Assignee: (was: Arina Ielchiieva)

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5781) Fix unit test failures to use tests config even if default config is available

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5781:

Reviewer: Arina Ielchiieva

> Fix unit test failures to use tests config even if default config is available
> --
>
> Key: DRILL-5781
> URL: https://issues.apache.org/jira/browse/DRILL-5781
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
> Fix For: 1.12.0
>
>
> Unit tests fail when they are run with the mapr profile.
> Tests failures, connected with the Zookeeper configuration that differs from 
> expected:
> {noformat}
> DrillClientTest>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> Coul...
>   TestZookeeperClient.testPutWithMatchingVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testStartingClientEnablesCacheAndEnsuresRootNodeExists 
> » IO
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathThrowsDrillRuntimeException » IO Could not 
> conf...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathFalseWithVersion » IO Could not configure 
> serve...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testPutAndGetWorksAntagonistacally » IO Could not 
> configure...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testGetWithVersion » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testStoreRegistersDispatcherAndStartsItsClient » IO 
> Could n...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testPutWithNonMatchingVersion » IO Could not configure 
> ser...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testGetWithEventualConsistencyHitsCache » IO Could not 
> con...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenPresent » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathTrueWithVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutAndGetWorks » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenAbsent » IO Could not configure 
> server ...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathWithEventualConsistencyHitsCache » IO Could 
> not...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testCreate » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testDelete » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testEntriesReturnsRelativePaths » IO Could not 
> configure s...
>   TestZookeeperClient.tearDown:86 NullPointer
> TestPStoreProviders>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> ...
>   TestPauseInjection.pauseOnSpecificBit:151 » Runtime java.io.IOException: 
> Could...
>   TestExceptionInjection.injectionOnSpecificBit:217 » Runtime 
> java.io.IOExceptio...
> HBaseTestsSuite.initCluster:110 » IO No JAAS configuration section named 
> 'Serv...
> {noformat}
> Test failures, connected with Hadoop configuration that differs from expected:
> {noformat}
> TestInboundImpersonation.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationMetadata.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationDisabledWithMiniDFS.setup:37->BaseTestImpersonation.startMiniDfsCluster:106
>  » Runtime
>   
> TestImpersonationQueries.setup:46->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
> TestHiveStorage>HiveTestBase.generateHive:34 » Runtime 
> java.lang.RuntimeExcept...
>   TestInfoSchemaOnHiveStorage>HiveTestBase.generateHive:34 » Runtime 
> java.lang.R...
>   TestInbuiltHiveUDFs>HiveTestBase.generateHive:35 » ExecutionSetup Failure 
> sett...
>   TestSampleHiveUDFs>HiveTestBase.generateHive:35 » ExecutionSetup Failure 
> setti...
>   
> TestStorageBasedHiveAuthorization.setup:109->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestSqlStdBasedAuthorization.setup:72->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173685#comment-16173685
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140062742
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
--- End diff --

 Using a "flag file" instead of a config setting gives more flexibility; 
like no need to restart in order to turn this feature on/off, or can select to 
catch errors only in few nodes, and last -- can free the looping thread by 
deleting this "flag file". 
  I also plan on posting an announcement on the dev list about this new 
"feature", and see if there's any feedback. 



> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
>

[jira] [Assigned] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5745:
---

Assignee: Arina Ielchiieva

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5745:

Fix Version/s: 1.12.0

> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5795) Filter pushdown for parquet handles multi rowgroup file

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173534#comment-16173534
 ] 

ASF GitHub Bot commented on DRILL-5795:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/949#discussion_r140033471
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -819,63 +827,64 @@ private void init() throws IOException {
   }
 }
 rowGroupInfo.setEndpointByteMap(endpointByteMap);
+rowGroupInfo.setColumns(rg.getColumns());
 rgIndex++;
 rowGroupInfos.add(rowGroupInfo);
   }
 }
 
 this.endpointAffinities = 
AffinityCreator.getAffinityMap(rowGroupInfos);
+updatePartitionColTypeMap();
+  }
 
+  private void updatePartitionColTypeMap() {
 columnValueCounts = Maps.newHashMap();
 this.rowCount = 0;
 boolean first = true;
-for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
-  for (RowGroupMetadata rowGroup : file.getRowGroups()) {
-long rowCount = rowGroup.getRowCount();
-for (ColumnMetadata column : rowGroup.getColumns()) {
-  SchemaPath schemaPath = 
SchemaPath.getCompoundPath(column.getName());
-  Long previousCount = columnValueCounts.get(schemaPath);
-  if (previousCount != null) {
-if (previousCount != GroupScan.NO_COLUMN_STATS) {
-  if (column.getNulls() != null) {
-Long newCount = rowCount - column.getNulls();
-columnValueCounts.put(schemaPath, 
columnValueCounts.get(schemaPath) + newCount);
-  }
-}
-  } else {
+for (RowGroupInfo rowGroup : this.rowGroupInfos) {
--- End diff --

Isn't this doing the same thing as the original code? RowGroupInfos is 
built from the RowGroupMetadata in the files?


> Filter pushdown for parquet handles multi rowgroup file
> ---
>
> Key: DRILL-5795
> URL: https://issues.apache.org/jira/browse/DRILL-5795
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Damien Profeta
>  Labels: doc-impacting
>
> DRILL-1950 implemented the filter pushdown for parquet file but only in the 
> case of one rowgroup per parquet file. In the case of multiple rowgroups per 
> files, it detects that the rowgroup can be pruned but then tell to the 
> drillbit to read the whole file which leads to performance issue.
> Having multiple rowgroup per file helps to handle partitioned dataset and 
> still read only the relevant subset of data without ending with more file 
> than really needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5795) Filter pushdown for parquet handles multi rowgroup file

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173533#comment-16173533
 ] 

ASF GitHub Bot commented on DRILL-5795:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/949#discussion_r140036046
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -1095,7 +1104,7 @@ public GroupScan applyFilter(LogicalExpression 
filterExpr, UdfUtilities udfUtili
 
 final Set schemaPathsInExpr = filterExpr.accept(new 
ParquetRGFilterEvaluator.FieldReferenceFinder(), null);
 
-final List qualifiedRGs = new 
ArrayList<>(parquetTableMetadata.getFiles().size());
+final List qualifiedRGs = new 
ArrayList<>(rowGroupInfos.size());
--- End diff --

Never mind the previous comment. It's probably better to use RowGroupInfos 
throughout the code. 


> Filter pushdown for parquet handles multi rowgroup file
> ---
>
> Key: DRILL-5795
> URL: https://issues.apache.org/jira/browse/DRILL-5795
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Damien Profeta
>  Labels: doc-impacting
>
> DRILL-1950 implemented the filter pushdown for parquet file but only in the 
> case of one rowgroup per parquet file. In the case of multiple rowgroups per 
> files, it detects that the rowgroup can be pruned but then tell to the 
> drillbit to read the whole file which leads to performance issue.
> Having multiple rowgroup per file helps to handle partitioned dataset and 
> still read only the relevant subset of data without ending with more file 
> than really needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5745) Invalid "location" information in Drill web server

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173544#comment-16173544
 ] 

ASF GitHub Bot commented on DRILL-5745:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/948
  
+1, LGTM.


> Invalid "location" information in Drill web server
> --
>
> Key: DRILL-5745
> URL: https://issues.apache.org/jira/browse/DRILL-5745
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> The file {{ProfileResources.java}} has the following incorrect code line:
> {code}
>   this.location = "http://localhost:8047/profile/; + queryId + ".json";
> {code}
> This code makes three errors.
> 1. The "http" prefix ignores the fact that the Drillbit can have SSL enabled 
> for the web server.
> 2. In a browser, "localhost" refers to the the machine running the browser. 
> This is valid only if the browser runs on the same machine as the Drillbit, 
> which is not, in general, true.
> 3. The port number is hardcoded to 8047, but it can be customized in the 
> config file.
> Therefore, most of the time, the link won't work on a production server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5806) DrillRuntimeException: Interrupted but context.shouldContinue() is true

2017-09-20 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173643#comment-16173643
 ] 

Arina Ielchiieva commented on DRILL-5806:
-

[~khfaraaz] is it a regression?

> DrillRuntimeException: Interrupted but context.shouldContinue() is true
> ---
>
> Key: DRILL-5806
> URL: https://issues.apache.org/jira/browse/DRILL-5806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.12.0
> Environment: Drill 1.12.0 commit : 
> aaff1b35b7339fb4e6ab480dd517994ff9f0a5c5
>Reporter: Khurram Faraaz
>
> On a three node cluster
> 1. run concurrent queries (TPC-DS query 11) from a Java program.
> 2. stop the drillbit (foreman drillbit) this way, 
> /opt/mapr/drill/drill-1.12.0/bin/drillbit.sh stop
> 3. InterruptedException: null, is written to the drillbit.log
> Stack trace from drillbit.log
> {noformat}
> 2017-09-19 21:49:20,867 [263e6f48-0ace-0c0d-4f90-55ae2f0d778b:frag:5:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: InterruptedException
> Fragment 5:0
> [Error Id: 63ce8c18-040a-47f9-9643-e826de9a1a27 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> InterruptedException
> Fragment 5:0
> [Error Id: 63ce8c18-040a-47f9-9643-e826de9a1a27 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> Interrupted but context.shouldContinue() is true
> at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:178)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:164)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen498.doWork(HashAggTemplate.java:581)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:168)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
>

[jira] [Commented] (DRILL-5799) native-client: Support alternative build directories

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173642#comment-16173642
 ] 

ASF GitHub Bot commented on DRILL-5799:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/946
  
Build issue has been corrected via another PR.


> native-client: Support alternative build directories
> 
>
> Key: DRILL-5799
> URL: https://issues.apache.org/jira/browse/DRILL-5799
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Reporter: Uwe L. Korn
>
> At the moment the native client only supports {{build}} as its build 
> directory. This should be freely choosable by the user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5721) Query with only root fragment and no non-root fragment hangs when Drillbit to Drillbit Control Connection has network issues

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173640#comment-16173640
 ] 

ASF GitHub Bot commented on DRILL-5721:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/919
  
Rebased on latest master and squashed the initial 3 commits. But I have 
kept the commit to resolve conflict separate as there are some changes made 
w.r.t DRILL-3449 behavior, and added some new unit tests.


> Query with only root fragment and no non-root fragment hangs when Drillbit to 
> Drillbit Control Connection has network issues
> 
>
> Key: DRILL-5721
> URL: https://issues.apache.org/jira/browse/DRILL-5721
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Recently I found an issue (Thanks to [~knguyen] to create this scenario) 
> related to Fragment Status reporting and would like some feedback on it. 
> When a client submits a query to Foreman, then it is planned by Foreman and 
> later fragments are scheduled to root and non-root nodes. Foreman creates a 
> DriilbitStatusListener and FragmentStatusListener to know about the health of 
> Drillbit node and a fragment respectively. The way root and non-root 
> fragments are setup by Foreman are different: 
> Root fragments are setup without any communication over control channel 
> (since it is executed locally on Foreman)
> Non-root fragments are setup by sending control message 
> (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending 
> any such control message (like due to network hiccup's) during query setup 
> then the query is failed and client is notified. 
> Each fragment is executed on it's node with the help Fragment Executor which 
> has an instance for FragmentStatusReporter. FragmentStatusReporter helps to 
> update the status of a fragment to Foreman node over a control tunnel or 
> connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root 
> fragments. 
> Based on above when root fragment is submitted for setup then it is done 
> locally without any RPC communication whereas when status for that fragment 
> is reported by fragment executor that happens over control connection by 
> sending a RPC message. But for non-root fragment setup and status update both 
> happens using RPC message over control connection.
> *Issue 1:*
> What was observed is if for a simple query which has only 1 root fragment 
> running on Foreman node then setup will work fine. But as part of status 
> update when the fragment tries to create a control connection and fails to 
> establish that, then the query hangs. This is because the root fragment will 
> complete execution but will fail to update Foreman about it and Foreman think 
> that the query is running for ever. 
> *Proposed Solution:*
> For root fragment the setup of fragment is happening locally without RPC 
> message, so we can do the same for status update of root fragments. This will 
> avoid RPC communication for status update of fragments running locally on the 
> foreman and hence will resolve issue 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-09-20 Thread Roman Kulyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Kulyk updated DRILL-5083:
---
Attachment: Reproduce5083.jpg

> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman Kulyk
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: DrillOperatorErrorHandlingRedesign.pdf, Reproduce5083.jpg
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5706) Select * on hbase table having multiple regions(one or more empty) returns wrong result intermittently

2017-09-20 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172202#comment-16172202
 ] 

Vitalii Diravka edited comment on DRILL-5706 at 9/20/17 5:40 PM:
-

I have a reproduce on latest Drill master version:
{code}
0: jdbc:drill:> select myhbase.cf1 from hbase. myhbase;
+--+
|   cf1|
+--+
| {"col1":"c29tZWRhdGE="}  |
| {"col1":"c29tZWRhdGE="}  |
| {"col1":"c29tZWRhdGE="}  |
+--+
3 rows selected (0.169 seconds)
0: jdbc:drill:> select myhbase.cf1 from hbase. myhbase;
+--+
| cf1  |
+--+
+--+
No rows selected (0.188 seconds)
{code}

Without fde0a1df1734e0 commit (fix for DRILL-5546) there is no this issue.


was (Author: vitalii):
[~prasadns14] Looks like the result is correct. Please verify:
{code}
0: jdbc:drill:> select * from hbase.myhbase;
+--+--+
|   row_key|   cf1|
+--+--+
| [B@3486f312  | {"col1":"c29tZWRhdGE="}  |
| [B@12e0f043  | {"col1":"c29tZWRhdGE="}  |
| [B@6dbdc863  | {"col1":"c29tZWRhdGE="}  |
+--+--+
3 rows selected (0.322 seconds)
0: jdbc:drill:> select convert_from(row_key, 'UTF8') as id, 
convert_from(myhbase.cf1.col1, 'UTF8') col1 from hbase. myhbase;
+-+---+
| id  |   col1|
+-+---+
| c   | somedata  |
| a   | somedata  |
| b   | somedata  |
+-+---+
3 rows selected (0.262 seconds)
{code}

Note: the above result is the same for Drill-1.10 and Drill-1.12 master version.

> Select * on hbase table having multiple regions(one or more empty) returns 
> wrong result intermittently
> --
>
> Key: DRILL-5706
> URL: https://issues.apache.org/jira/browse/DRILL-5706
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Vitalii Diravka
>
> 1) Create a hbase table with 4 regions
> {code}
> create 'myhbase', 'cf1', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col1','somedata'
> put 'myhbase','c','cf1:col1','somedata'
> {code}
> 2) Run select * on the hbase table
> {code}
> select * from hbase.myhbase;
> {code}
> The query returns wrong result intermittently



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5808) Reduce memory allocator strictness for "managed" operators

2017-09-20 Thread Boaz Ben-Zvi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173659#comment-16173659
 ] 

Boaz Ben-Zvi commented on DRILL-5808:
-

   Implementing a dynamic system of memory "quotas" would work better and 
address the above memory limitations as well. One simple implementation of this 
system: Divide the memory available in 2, and keep one half as "reserve". The 
other half can be equally divided among all the buffered operators as their 
"quotas". Then enhance the allocator that in case it hits the limit, it should 
request more memory from the "reserve" (if not available then OOM). Also when 
each operator needs no more memory (e.g., Hash Join finished the build phase) , 
then this operator can return the leftover quota to the "reserve". 


> Reduce memory allocator strictness for "managed" operators
> --
>
> Key: DRILL-5808
> URL: https://issues.apache.org/jira/browse/DRILL-5808
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Drill 1.11 and 1.12 introduce new "managed" versions of the sort and hash agg 
> that enforce memory limits, spilling to disk when necessary.
> Drill's internal memory system is very "lumpy" and unpredictable. The 
> operators have no control over the incoming batch size; an overly large batch 
> can cause the operator to exceed its memory limit before it has a chance to 
> do any work.
> Vector allocations grow in power-of-two sizes. Adding a single record can 
> double the memory allocated to a vector.
> Drill has no metadata, so operators cannot predict the size of VarChar 
> columns nor the cardinality of arrays. The "Record Batch Sizer" tries to 
> extract this information on each batch, but it works with averages, and 
> specific column patterns can still throw off the memory calculations. (For 
> example, having a series of very wide columns for A-M and very narrow columns 
> for N-Z will cause a moderate average. But, once sorted, the A-M rows, and 
> batches, will be much larger than expected, causing out-of-memory errors.)
> At present, if an operator is wrong in its memory usage by a single byte, the 
> entire query is killed. That is, the user pays the death penalty (of queries) 
> for poor design decisions within Drill. This leads to a less-than-optimal 
> user experience.
> The proposal here is to make the memory allocator less strict for "managed" 
> operators.
> First, we recognize that the managed operators do attempt to control memory 
> and, if designed well, will, on average hit their targets.
> Second, we recognize that, due to the lumpiness issues above, any single 
> operator may exceed, or be under, the configured maximum memory.
> Given this, the proposal here is:
> 1. An operator identifies itself as managed to the memory allocator.
> 2. In managed mode, the allocator has soft limits. It emits a warning to the 
> log when the limit is exceeded.
> 3. For safety, in managed mode, the allocator enforces a hard limit larger 
> than the configured limit.
> The enforcement limit might be:
> * For memory sizes < 100MB, up to 2x the configured limit.
> * For larger memory sizes, no more than 100MB over the configured limit.
> The exact numbers can be made configurable.
> Now, during testing, scripts should look for over-memory warnings. Each 
> should be fixed as we fix OOM issues today. But, during production, user 
> queries are far less likely to fail due to any remaining corner cases that 
> throw off the memory calculations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173687#comment-16173687
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140062933
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
+  if ( spinFile.exists() ) {
+File tmpDir = new File("/tmp");
+File outErr = null;
+try {
+  outErr = File.createTempFile("spin", ".tmp", tmpDir);
+  BufferedWriter bw = new BufferedWriter(new FileWriter(outErr));
+  bw.write("Spinning process: " + 
ManagementFactory.getRuntimeMXBean().getName()
+  /* After upgrading to JDK 9 - replace with: 
ProcessHandle.current().getPid() */);
+  bw.write("\nError cause: " +
+(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: 
" + ErrorHelper.getRootMessage(cause)) : message));
+  bw.close();
+} catch (Exception ex) {
+  logger.warn("Failed creating a spinner tmp message file: {}", 
ex);
+}
+while (spinFile.exists()) {
+  try { sleep(1_000); } catch (Exception ex) { /* ignore 
interruptions */ }
--- End diff --

 Does query killing cause a user exception ?



> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>

[jira] [Commented] (DRILL-5781) Fix unit test failures to use tests config even if default config is available

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173620#comment-16173620
 ] 

ASF GitHub Bot commented on DRILL-5781:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r139247294
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java 
---
@@ -100,6 +101,14 @@ public void run() {
 return dir.getAbsolutePath() + File.separator + dirName;
   }
 
+  /**
+   * Sets zookeeper server and client SASL test config properties.
+   */
+  public static void setZookeeperSaslTestConfigProps() {
+System.setProperty(ZooKeeperSaslServer.LOGIN_CONTEXT_NAME_KEY, 
"Test_server");
--- End diff --

Maybe something like `DrillTestServerForUnitTests`, 
`DrillTestClientForUnitTests`.


> Fix unit test failures to use tests config even if default config is available
> --
>
> Key: DRILL-5781
> URL: https://issues.apache.org/jira/browse/DRILL-5781
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
> Fix For: 1.12.0
>
>
> Unit tests fail when they are run with the mapr profile.
> Tests failures, connected with the Zookeeper configuration that differs from 
> expected:
> {noformat}
> DrillClientTest>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> Coul...
>   TestZookeeperClient.testPutWithMatchingVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testStartingClientEnablesCacheAndEnsuresRootNodeExists 
> » IO
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathThrowsDrillRuntimeException » IO Could not 
> conf...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathFalseWithVersion » IO Could not configure 
> serve...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testPutAndGetWorksAntagonistacally » IO Could not 
> configure...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testGetWithVersion » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestEphemeralStore.testStoreRegistersDispatcherAndStartsItsClient » IO 
> Could n...
>   TestEphemeralStore.tearDown:132 NullPointer
>   TestZookeeperClient.testPutWithNonMatchingVersion » IO Could not configure 
> ser...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testGetWithEventualConsistencyHitsCache » IO Could not 
> con...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenPresent » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathTrueWithVersion » IO Could not configure 
> server...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutAndGetWorks » IO Could not configure server 
> because...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testPutIfAbsentWhenAbsent » IO Could not configure 
> server ...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testHasPathWithEventualConsistencyHitsCache » IO Could 
> not...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testCreate » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testDelete » IO Could not configure server because SASL 
> co...
>   TestZookeeperClient.tearDown:86 NullPointer
>   TestZookeeperClient.testEntriesReturnsRelativePaths » IO Could not 
> configure s...
>   TestZookeeperClient.tearDown:86 NullPointer
> TestPStoreProviders>TestWithZookeeper.setUp:32 » Runtime java.io.IOException: 
> ...
>   TestPauseInjection.pauseOnSpecificBit:151 » Runtime java.io.IOException: 
> Could...
>   TestExceptionInjection.injectionOnSpecificBit:217 » Runtime 
> java.io.IOExceptio...
> HBaseTestsSuite.initCluster:110 » IO No JAAS configuration section named 
> 'Serv...
> {noformat}
> Test failures, connected with Hadoop configuration that differs from expected:
> {noformat}
> TestInboundImpersonation.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationMetadata.setup:58->BaseTestImpersonation.startMiniDfsCluster:80->BaseTestImpersonation.startMiniDfsCluster:111
>  » ClassCast
>   
> TestImpersonationDisabledWithMiniDFS.setup:37->BaseTestImpersonation.startMiniDfsCluster:106
>  » Runtime
>   
>

[jira] [Commented] (DRILL-5425) Support HTTP Kerberos auth using SPNEGO

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173645#comment-16173645
 ] 

ASF GitHub Bot commented on DRILL-5425:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/944
  
@sohami, can you review this one? 


> Support HTTP Kerberos auth using SPNEGO
> ---
>
> Key: DRILL-5425
> URL: https://issues.apache.org/jira/browse/DRILL-5425
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Reporter: Sudheesh Katkam
>Assignee: Sindhuri Ramanarayan Rayavaram
>
> DRILL-4280 supports Kerberos through JDBC and ODBC API. This ticket requests 
> to add Kerberos (using [SPENGO|https://en.wikipedia.org/wiki/SPNEGO]) for 
> HTTP connections.
> This requires creating "direct" web sessions; currently web sessions are 
> sessions over Java client sessions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-1162) 25 way join ended up with OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173654#comment-16173654
 ] 

ASF GitHub Bot commented on DRILL-1162:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/905
  
@amansinha100, can you give this one a review? 


> 25 way join ended up with OOM
> -
>
> Key: DRILL-1162
> URL: https://issues.apache.org/jira/browse/DRILL-1162
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: Future
>
> Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned 
> {code:sql}
> select count(*) from `lineitem1.parquet` a 
> inner join `part.parquet` j on a.l_partkey = j.p_partkey 
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey 
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey 
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey 
> = m.ps_suppkey 
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey 
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey 
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey 
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey 
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice 
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment 
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate 
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate 
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate 
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate 
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate 
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate 
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate 
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate 
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate 
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate 
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate 
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate 
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns 
> '716372534'. Since the last inner join is similar to the one's before it, it 
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5431) Support SSL

2017-09-20 Thread Parth Chandra (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173681#comment-16173681
 ] 

Parth Chandra commented on DRILL-5431:
--

Submitted PR: https://github.com/apache/drill/pull/950


> Support SSL
> ---
>
> Key: DRILL-5431
> URL: https://issues.apache.org/jira/browse/DRILL-5431
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - Java, Client - ODBC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> Support SSL between Drillbit and JDBC/ODBC drivers. Drill already supports 
> HTTPS for web traffic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5002) Using hive's date functions on top of date column gives wrong results for local time-zone

2017-09-20 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-5002.
-

automation added.

> Using hive's date functions on top of date column gives wrong results for 
> local time-zone
> -
>
> Key: DRILL-5002
> URL: https://issues.apache.org/jira/browse/DRILL-5002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
> Attachments: 0_0_0.parquet
>
>
> git.commit.id.abbrev=190d5d4
> Wrong Result 1 :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1994-02-01' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1994-02-01  | 1   |
> | 1994-02-01  | 1   |
> +-+-+
> {code}
> Wrong Result 2 : 
> {code}
> select l_shipdate, `day`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 1   |
> | 1998-06-02  | 1   |
> +-+-+
> {code}
> Correct Result :
> {code}
> select l_shipdate, `month`(l_shipdate) from cp.`tpch/lineitem.parquet` where 
> l_shipdate = date '1998-06-02' limit 2;
> +-+-+
> | l_shipdate  | EXPR$1  |
> +-+-+
> | 1998-06-02  | 6   |
> | 1998-06-02  | 6   |
> +-+-+
> {code}
> It looks like we are getting wrong results when the 'day' is '01'. I only 
> tried month and day hive functionsbut wouldn't be surprised if they have 
> similar issues too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173868#comment-16173868
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140098546
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
--- End diff --

Done 


> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
>

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173867#comment-16173867
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140098512
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
--- End diff --

Done


> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
> org.apache.drill.exec.memory.BaseAllocator.buffer():238
> org.apache.drill.exec.memory.BaseAllocator.buffer():213
> org.apache.drill.exec.vector.IntVector.allocateBytes():231
>

[jira] [Assigned] (DRILL-5645) negation of expression causes null pointer exception

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5645:
---

Assignee: (was: Arina Ielchiieva)

> negation of expression causes null pointer exception
> 
>
> Key: DRILL-5645
> URL: https://issues.apache.org/jira/browse/DRILL-5645
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.10.0
> Environment: Drill 1.10
>Reporter: N Campbell
>
> Following statement will fail when the expression is negated
> select -(2 * 2) from ( values ( 1 ) ) T ( C1 )
> Error: SYSTEM ERROR: NullPointerException



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-09-20 Thread Chun Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173837#comment-16173837
 ] 

Chun Chang commented on DRILL-1051:
---

Automation added.

> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-09-20 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-1051:
--
Reviewer: Chun Chang  (was: Paul Rogers)

> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-1051) Casting timestamp as date gives wrong result for dates earlier than 1883

2017-09-20 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-1051.
-

> Casting timestamp as date gives wrong result for dates earlier than 1883
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5711) Incorrect operator profiles for queries on json files

2017-09-20 Thread Prasad Nagaraj Subramanya (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5711:
-
Description: 
1) Join query on two json files
{code}
select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, 
dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey;
{code}

2) Check the query profile. It has the following issues -
a) JSON_SUB_SCAN type incorrectly ordered
b) Missing SCREEN type

Attached
1) Two json files
2) Snapshot of query profile and operator profile

Commit id - 9d1d815737528251a7500621cc976b57e7f3be59

  was:
1) Join query on two json files
{code}
select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, 
dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey;
{code}

2) Check the query profile
a) JSON_SUB_SCAN type incorrectly ordered
b) Missing SCREEN type

Attached
1) Two json files
2) Snapshot of query profile and operator profile

Commit id - 9d1d815737528251a7500621cc976b57e7f3be59


> Incorrect operator profiles for queries on json files
> -
>
> Key: DRILL-5711
> URL: https://issues.apache.org/jira/browse/DRILL-5711
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
> Attachments: OperatorProfiles.png, part.json, partsupp.json, 
> QueryProfile.png
>
>
> 1) Join query on two json files
> {code}
> select ps.ps_suppkey from dfs.`testData/json/part.josn` as p, 
> dfs.`testData/json/partsupp.json` as ps where p.p_partkey = ps.ps_partkey;
> {code}
> 2) Check the query profile. It has the following issues -
> a) JSON_SUB_SCAN type incorrectly ordered
> b) Missing SCREEN type
> Attached
> 1) Two json files
> 2) Snapshot of query profile and operator profile
> Commit id - 9d1d815737528251a7500621cc976b57e7f3be59



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5645) negation of expression causes null pointer exception

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5645:

Reviewer: Arina Ielchiieva

> negation of expression causes null pointer exception
> 
>
> Key: DRILL-5645
> URL: https://issues.apache.org/jira/browse/DRILL-5645
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.10.0
> Environment: Drill 1.10
>Reporter: N Campbell
>
> Following statement will fail when the expression is negated
> select -(2 * 2) from ( values ( 1 ) ) T ( C1 )
> Error: SYSTEM ERROR: NullPointerException



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5645) negation of expression causes null pointer exception

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5645:
---

Assignee: Arina Ielchiieva

> negation of expression causes null pointer exception
> 
>
> Key: DRILL-5645
> URL: https://issues.apache.org/jira/browse/DRILL-5645
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.10.0
> Environment: Drill 1.10
>Reporter: N Campbell
>Assignee: Arina Ielchiieva
>
> Following statement will fail when the expression is negated
> select -(2 * 2) from ( values ( 1 ) ) T ( C1 )
> Error: SYSTEM ERROR: NullPointerException



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5795) Filter pushdown for parquet handles multi rowgroup file

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173820#comment-16173820
 ] 

ASF GitHub Bot commented on DRILL-5795:
---

Github user dprofeta commented on the issue:

https://github.com/apache/drill/pull/949
  
I will add a unit test to test the number of rowgroups that are scanned by 
the groupscan to see if the filter is well able to prune rowgroup.


> Filter pushdown for parquet handles multi rowgroup file
> ---
>
> Key: DRILL-5795
> URL: https://issues.apache.org/jira/browse/DRILL-5795
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Damien Profeta
>  Labels: doc-impacting
>
> DRILL-1950 implemented the filter pushdown for parquet file but only in the 
> case of one rowgroup per parquet file. In the case of multiple rowgroups per 
> files, it detects that the rowgroup can be pruned but then tell to the 
> drillbit to read the whole file which leads to performance issue.
> Having multiple rowgroup per file helps to handle partitioned dataset and 
> still read only the relevant subset of data without ending with more file 
> than really needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173831#comment-16173831
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140093627
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
+  if ( spinFile.exists() ) {
+File tmpDir = new File("/tmp");
+File outErr = null;
+try {
+  outErr = File.createTempFile("spin", ".tmp", tmpDir);
+  BufferedWriter bw = new BufferedWriter(new FileWriter(outErr));
+  bw.write("Spinning process: " + 
ManagementFactory.getRuntimeMXBean().getName()
+  /* After upgrading to JDK 9 - replace with: 
ProcessHandle.current().getPid() */);
+  bw.write("\nError cause: " +
+(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: 
" + ErrorHelper.getRootMessage(cause)) : message));
+  bw.close();
+} catch (Exception ex) {
+  logger.warn("Failed creating a spinner tmp message file: {}", 
ex);
+}
+while (spinFile.exists()) {
+  try { sleep(1_000); } catch (Exception ex) { /* ignore 
interruptions */ }
--- End diff --

Yes - if some non-blocked part tries to kill the query, the spinning parts 
would still be blocked - that may be by design, as debugging still goes on 
(until a user issues "clush -a rm /tmp/drill/spin" )



> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>

[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-09-20 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4735:
--
Reviewer: Jinfeng Ni  (was: Khurram Faraaz)

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-09-20 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-5083.
-

> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman Kulyk
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: DrillOperatorErrorHandlingRedesign.pdf, Reproduce5083.jpg
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5083) RecordIterator can sometimes restart a query on close

2017-09-20 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174055#comment-16174055
 ] 

Khurram Faraaz commented on DRILL-5083:
---

Verified on apache master Drill 1.12.0 commit 
aaff1b35b7339fb4e6ab480dd517994ff9f0a5c5, the issue is not seen with the fix.

> RecordIterator can sometimes restart a query on close
> -
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Roman Kulyk
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: DrillOperatorErrorHandlingRedesign.pdf, Reproduce5083.jpg
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>   // Check whether next() should even have been called in current state.
>   if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5564) IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space (16674816) + prealloc space (0) + child space (0) != allocated (16740352)

2017-09-20 Thread Roman Kulyk (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173886#comment-16173886
 ] 

Roman Kulyk commented on DRILL-5564:


[~khfaraaz], I can't reproduce these errors. Could you please provide more 
information about your env and dataset capacity?

Also as I see there should not be some hangs. In this case, we got only errors, 
am I right? And what should be the expected result: the query should correctly 
finish without 1 drillbit or the query should fail without assertion error?

> IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space 
> (16674816) + prealloc space (0) + child space (0) != allocated (16740352)
> ---
>
> Key: DRILL-5564
> URL: https://issues.apache.org/jira/browse/DRILL-5564
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
> Environment: 3 node CentOS cluster
>Reporter: Khurram Faraaz
>
> Run a concurrent Java program that executes TPCDS query11
> while the above concurrent java program is under execution
> stop foreman Drillbit (from another shell, using below command)
> ./bin/drillbit.sh stop
> and you will see the IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: 
>  and another assertion error, in the drillbit.log
> AssertionError: Failure while stopping processing for operator id 10. 
> Currently have states of processing:false, setup:false, waiting:true.   
> Drill 1.11.0 git commit ID: d11aba2 (with assertions enabled)
>  
> details from drillbit.log from the foreman Drillbit node.
> {noformat}
> 2017-06-05 18:38:33,838 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested RUNNING --> 
> FAILED
> 2017-06-05 18:38:33,849 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested FAILED --> 
> FINISHED
> 2017-06-05 18:38:33,852 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: AssertionError: 
> Failure while stopping processing for operator id 10. Currently have states 
> of processing:false, setup:false, waiting:true.
> Fragment 23:1
> [Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: Failure while stopping processing for operator id 10. 
> Currently have states of processing:false, setup:false, waiting:true.
> Fragment 23:1
> [Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Failure 
> while stopping processing for operator id 10. Currently have states of 
> processing:false, setup:false, waiting:true.
> at 
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Failure while stopping processing for 
> operator id 10. Currently have states of processing:false, setup:false, 
> waiting:true.
> at 
>

[jira] [Updated] (DRILL-5357) Partition pruning information not available in query plan for COUNT aggregate query

2017-09-20 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-5357:
--
Reviewer: Khurram Faraaz

> Partition pruning information not available in query plan for COUNT aggregate 
> query
> ---
>
> Key: DRILL-5357
> URL: https://issues.apache.org/jira/browse/DRILL-5357
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
> Environment: 3 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
>
> We are not seeing partition pruning information in the query plan for the 
> below, COUNT(*) and COUNT() query 
> Drill 1.10.0-SNAPSHOT
> git commit id: b657d44f
> parquet table has 6 columns
> total number of rows = 1638640
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prtn_prune_01 PARTITION BY 
> (col_state) 
> AS 
> SELECT CAST(columns[0] AS DATE) col_date, 
> CAST(columns[1] AS CHAR(3)) col_state, 
> CAST(columns[2] AS INTEGER) col_prime, 
> CAST(columns[3] AS VARCHAR(256)) col_varstr, 
> CAST(columns[4] AS INTEGER) col_id, 
> CAST(columns[5] AS VARCHAR(50)) col_name 
> from `partition_prune_data.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1638640|
> +---++
> 1 row selected (17.675 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select COUNT(*) from tbl_prtn_prune_01 where 
> col_state = 'CA';
> +-+
> | EXPR$0  |
> +-+
> | 35653   |
> +-+
> 1 row selected (0.471 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@1d4bb67d[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> {noformat}
> And then I did a REFRESH TABLE METADATA on the parquet table
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> refresh table metadata tbl_prtn_prune_01;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | Successfully updated metadata for table tbl_prtn_prune_01.  |
> +---+-+
> 1 row selected (0.321 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(col_state) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@2e0f4be9[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3fc1f8e7[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(col_date) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@7afc851e[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (DRILL-5740) hash agg fail to read spill file

2017-09-20 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-5740.
-
   Resolution: Fixed
Fix Version/s: 1.12.0

The commit for DRILL-5694 (PR #938) also solves this bug (basically removed an 
unneeded closing of the SpillSet).


> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.12.0
>
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-09-20 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174037#comment-16174037
 ] 

Khurram Faraaz commented on DRILL-4735:
---

Verified on Drill 1.12.0 commit aaff1b35b7339fb4e6ab480dd517994ff9f0a5c5.
three nodes (drillbits) cluster

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(dir0) from `DRILL_4589`;
+---+
|  EXPR$0   |
+---+
| 30148545  |
+---+
1 row selected (74.548 seconds)
0: jdbc:drill:schema=dfs.tmp> select count(dir1) from `DRILL_4589`;
+---+
|  EXPR$0   |
+---+
| 30144920  |
+---+
1 row selected (72.543 seconds)
0: jdbc:drill:schema=dfs.tmp> select count(dir0), count(dir1) from `DRILL_4589`;
+---+---+
|  EXPR$0   |  EXPR$1   |
+---+---+
| 30148545  | 30144920  |
+---+---+
1 row selected (89.369 seconds)
{noformat}

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-09-20 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4735:
--
Reviewer: Khurram Faraaz  (was: Jinfeng Ni)

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-3407) CTAS Auto Partition : The plan for count(*) should show the list of files scanned

2017-09-20 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-3407.
-

Part of DRILL-4735.

> CTAS Auto Partition : The plan for count(*) should show the list of files 
> scanned
> -
>
> Key: DRILL-3407
> URL: https://issues.apache.org/jira/browse/DRILL-3407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.12.0
>
>
> #Generated by Git-Commit-Id-Plugin
> #Fri Jun 26 19:46:34 UTC 2015
> git.commit.id.abbrev=60bc945
> The below plan does not give information about the list of files scanned
> {code}
> 0: jdbc:drill:schema=dfs_eea> explain plan for select count(*) from 
> `existing_partition_pruning/lineitempart` where dir0=1991;
> +-+---+
> |text 
> | json  |
> +-+---+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@17153c76])
>  | {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "name" : "drill.exec.storage.file.partition.column.label",
>   "kind" : "STRING",
>   "type" : "SESSION",
>   "string_val" : "partition_string1"
> } ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "DirectGroupScan",
> "@id" : 2,
> "cost" : 20.0
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`EXPR$0`",
>   "expr" : "`count`"
> } ],
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 20.0
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 20.0
>   } ]
> } |
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5727) Update release profile to generate SHA-512 checksum.

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174027#comment-16174027
 ] 

ASF GitHub Bot commented on DRILL-5727:
---

GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/951

DRILL-5727: Update release profile to generate SHA-512 checksum.

New Apache release guidelines require a sha-512 checksum 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-5727

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #951


commit ed1d5508dbe70a7b58bbf36628325462644ed19e
Author: Parth Chandra 
Date:   2017-09-20T20:42:54Z

DRILL-5727: Update release profile to generate SHA-512 checksum.




> Update release profile to generate SHA-512 checksum.
> 
>
> Key: DRILL-5727
> URL: https://issues.apache.org/jira/browse/DRILL-5727
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>
> Per latest release guidelines, we should generate a sha-512 checksum with the 
> release artifacts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5357) Partition pruning information not available in query plan for COUNT aggregate query

2017-09-20 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174052#comment-16174052
 ] 

Khurram Faraaz commented on DRILL-5357:
---

Verified fix on Drill 1.12.0 commit: aaff1b35b7339fb4e6ab480dd517994ff9f0a5c5

{noformat}
0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prtn_prune_01 PARTITION BY 
(col_state) 
. . . . . . . . . . . . . . > AS 
. . . . . . . . . . . . . . > SELECT CAST(columns[0] AS DATE) col_date, 
. . . . . . . . . . . . . . > CAST(columns[1] AS CHAR(3)) col_state, 
. . . . . . . . . . . . . . > CAST(columns[2] AS INTEGER) col_prime, 
. . . . . . . . . . . . . . > CAST(columns[3] AS VARCHAR(256)) col_varstr, 
. . . . . . . . . . . . . . > CAST(columns[4] AS INTEGER) col_id, 
. . . . . . . . . . . . . . > CAST(columns[5] AS VARCHAR(50)) col_name 
. . . . . . . . . . . . . . > from `partition_prune_data.csv`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1638640|
+---++
1 row selected (70.986 seconds)

0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
tbl_prtn_prune_01 where col_state = 'CA';
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02Scan(groupscan=[files = [/tmp/tbl_prtn_prune_01/0_0_5.parquet], 
numFiles = 1, DynamicPojoRecordReader{records = [[35653]]}])

Another test

0: jdbc:drill:schema=dfs.tmp> explain plan for  select c1 from 
`DRILL_4589/1998/Q3` where c1 > 1000 limit 1;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(c1=[$0])
00-02SelectionVectorRemover
00-03  Limit(fetch=[1])
00-04Limit(fetch=[1])
00-05  Filter(condition=[>($0, 1000)])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q3/f459.parquet]], 
selectionRoot=/tmp/DRILL_4589/1998/Q3, numFiles=1, usedMetadataFile=true, 
cacheFileRoot=/tmp/DRILL_4589/1998/Q3, columns=[`c1`]]])
{noformat}

> Partition pruning information not available in query plan for COUNT aggregate 
> query
> ---
>
> Key: DRILL-5357
> URL: https://issues.apache.org/jira/browse/DRILL-5357
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
> Environment: 3 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
> Fix For: 1.12.0
>
>
> We are not seeing partition pruning information in the query plan for the 
> below, COUNT(*) and COUNT() query 
> Drill 1.10.0-SNAPSHOT
> git commit id: b657d44f
> parquet table has 6 columns
> total number of rows = 1638640
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prtn_prune_01 PARTITION BY 
> (col_state) 
> AS 
> SELECT CAST(columns[0] AS DATE) col_date, 
> CAST(columns[1] AS CHAR(3)) col_state, 
> CAST(columns[2] AS INTEGER) col_prime, 
> CAST(columns[3] AS VARCHAR(256)) col_varstr, 
> CAST(columns[4] AS INTEGER) col_id, 
> CAST(columns[5] AS VARCHAR(50)) col_name 
> from `partition_prune_data.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1638640|
> +---++
> 1 row selected (17.675 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select COUNT(*) from tbl_prtn_prune_01 where 
> col_state = 'CA';
> +-+
> | EXPR$0  |
> +-+
> | 35653   |
> +-+
> 1 row selected (0.471 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@1d4bb67d[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> {noformat}
> And then I did a REFRESH TABLE METADATA on the parquet table
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> refresh table metadata tbl_prtn_prune_01;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | Successfully updated metadata for table tbl_prtn_prune_01.  |
> +---+-+
> 1 row selected (0.321 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(col_state) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +--+--+
> | text |

[jira] [Updated] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-20 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5694:

Labels: ready-to-commit  (was: )

> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>  Labels: ready-to-commit
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
> org.apache.drill.exec.memory.BaseAllocator.buffer():238
> org.apache.drill.exec.memory.BaseAllocator.buffer():213
> org.apache.drill.exec.vector.IntVector.allocateBytes():231
> org.apache.drill.exec.vector.IntVector.allocateNew():211
> 
> org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778
> 
> org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717
> org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643
> org.apache.drill.exec.test.generated.HashTableGen2141.put():618
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
>

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-09-20 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173998#comment-16173998
 ] 

Khurram Faraaz commented on DRILL-4735:
---

[~cch...@maprtech.com] re-assigned Jinfeng as reviewer, I will own this from 
test and add tests and mark it as closed once tests are reviewed and merged. 
Thanks.

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (DRILL-5715) Performance of refactored HashAgg operator regressed

2017-09-20 Thread Boaz Ben-Zvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-5715.
-
Resolution: Fixed
  Reviewer: Paul Rogers

 The commit for DRILL-5694 (PR #938) also solves this performance bug 
(basically removed calls to Setup before every hash computation, plus few 
little changes like replacing setSafe with set ).

> Performance of refactored HashAgg operator regressed
> 
>
> Key: DRILL-5715
> URL: https://issues.apache.org/jira/browse/DRILL-5715
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
> Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM)
>Reporter: Kunal Khatua
>Assignee: Boaz Ben-Zvi
>  Labels: performance, regression
> Fix For: 1.12.0
>
> Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill, 
> 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill, 
> 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill, 
> 2675de42-3789-47b8-29e8-c5077af136db.sys.drill, drill-1.10.0_callTree.png, 
> drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png
>
>
> When running the following simple HashAgg-based query on a TPCH-table - 
> Lineitem with 6Billion rows on a 10 node setup (with a single partition to 
> disable any possible spilling to disk)
> {code:sql}
> select count(*) 
> from (
>   select l_quantity
> , count(l_orderkey) 
>   from lineitem 
>   group by l_quantity 
> )  {code}
> the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the 
> JDBC client].
> To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was 
> modified to 
> {code}drill.exec.hashagg.num_partitions : 1{code}
> Attached are two profiles
> Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] 
> Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]
> A separate run was done for both scenarios with the 
> {{planner.width.max_per_node=10}} and profiled with YourKit.
> Image snippets are attached, indicating the hotspots in both builds:
> *Drill 1.10.0* : 
>  Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
>  CallTree: [^drill-1.10.0_callTree.png]
>  HotSpot: [^drill-1.10.0_hotspot.png]
> !drill-1.10.0_hotspot.png|drill-1.10.0_hotspot!
> *Drill 1.11.0* : 
>  Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
>  CallTree: [^drill-1.11.0_callTree.png]
>  HotSpot: [^drill-1.11.0_hotspot.png] 
> !drill-1.11.0_hotspot.png|drill-1.11.0_hotspot!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5727) Update release profile to generate SHA-512 checksum.

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174090#comment-16174090
 ] 

ASF GitHub Bot commented on DRILL-5727:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/951#discussion_r140129986
  
--- Diff: pom.xml ---
@@ -977,6 +977,7 @@
   
 MD5
 SHA-1
+SHA-512
--- End diff --

Maybe we can remove SHA-1 usage?


> Update release profile to generate SHA-512 checksum.
> 
>
> Key: DRILL-5727
> URL: https://issues.apache.org/jira/browse/DRILL-5727
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>
> Per latest release guidelines, we should generate a sha-512 checksum with the 
> release artifacts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5431) Support SSL

2017-09-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174087#comment-16174087
 ] 

ASF GitHub Bot commented on DRILL-5431:
---

Github user superbstreak commented on a diff in the pull request:

https://github.com/apache/drill/pull/950#discussion_r140129825
  
--- Diff: contrib/native/client/src/include/drill/common.hpp ---
@@ -163,9 +170,13 @@ typedef enum{
 #define USERPROP_USERNAME "userName"
 #define USERPROP_PASSWORD "password"
 #define USERPROP_SCHEMA   "schema"
-#define USERPROP_USESSL   "useSSL"// Not implemented yet
-#define USERPROP_FILEPATH "pemLocation"   // Not implemented yet
-#define USERPROP_FILENAME "pemFile"   // Not implemented yet
+#define USERPROP_USESSL   "enableTLS"
+#define USERPROP_TLSPROTOCOL "TLSProtocol" //TLS version
+#define USERPROP_CERTFILEPATH "certFilePath" // pem file path and name
+#define USERPROP_CERTPASSWORD "certPassword" // Password for certificate 
file
--- End diff --

I think we can remove this to avoid confusion :)


> Support SSL
> ---
>
> Key: DRILL-5431
> URL: https://issues.apache.org/jira/browse/DRILL-5431
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Client - Java, Client - ODBC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> Support SSL between Drillbit and JDBC/ODBC drivers. Drill already supports 
> HTTPS for web traffic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-4139) Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4139:

Reviewer: salim achouche  (was: Jinfeng Ni)

> Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types
> -
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>  Labels: ready-to-commit
> Attachments: metadata file v3, metadata file with changes
>
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-4139) Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types

2017-09-20 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4139:

Labels: ready-to-commit  (was: )

> Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types
> -
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>  Labels: ready-to-commit
> Attachments: metadata file v3, metadata file with changes
>
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

59 matches

Mail list logo