[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588396#comment-16588396
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on issue #1334: DRILL-6385: Support JPPD feature
URL: https://github.com/apache/drill/pull/1334#issuecomment-414913778
 
 
   @arina-ielchiieva done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6676) Add Union, List and Repeated List types to Result Set Loader

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588327#comment-16588327
 ] 

ASF GitHub Bot commented on DRILL-6676:
---

ppadma commented on issue #1429: DRILL-6676: Add Union, List and Repeated List 
types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#issuecomment-414895921
 
 
   @paul-rogers Thanks for the changes. Just a quick update.  I am still going 
through the code. I hope to finish the review soon. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Union, List and Repeated List types to Result Set Loader
> 
>
> Key: DRILL-6676
> URL: https://issues.apache.org/jira/browse/DRILL-6676
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.15.0
>
>
> Add support for the "obscure" vector types to the {{ResultSetLoader}}:
> * Union
> * List
> * Repeated List



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588284#comment-16588284
 ] 

ASF GitHub Bot commented on DRILL-6566:
---

Ben-Zvi opened a new pull request #1438: DRILL-6566: Reduce Hash Agg Batch size 
and estimate when low available memory
URL: https://github.com/apache/drill/pull/1438
 
 
   (1) First commit just renamed MAX_BATCH_SIZE to MAX_BATCH_ROW_COUNT in order 
to avoid confusion over "size".
   (2) 2nd commit: Addressing two issues: The configuration batch size (default 
16M) is taken as is (by the memory manager). And the (outgoing) batch size 
estimates are created early (when the outgoing is empty), based on 64K rows per 
batch.
  The change: Taking the Hash-Agg memory limit into account, and planning 
for multiple batches, the configured size (e.g. 16M) may be reduced to allow 
for the needed number of batches (this new size is given to the memory manager).
   Later when the estimates are made, that (possibly reduced) size is used to 
reduce the estimates, if needed.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6566
>
>
> This is TPCDS Query 66.
> Query: tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 

[jira] [Commented] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/

2018-08-21 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588032#comment-16588032
 ] 

Robert Hou commented on DRILL-6569:
---

This test last passed in January.  I re-tested with two commits in January 
(b4ffa40127c040d2f8d9ebe2fd4623dfac8c7724 on January 5 and 
27aff35b54df0adfd951c7b7afc47b36a6de5e0a on January 12).  This test passed on 
these two commits in January, but it fails now on both.  So I do not think this 
issue is directly caused by Drill code.  It could be in Hive, or in the 
interface between Hive and Drill.

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> There are roughly 15 similar failures in the Advanced nightly run, out of 37 
> failures.  So this issue accounts for about half the failures.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> 

[jira] [Updated] (DRILL-6702) OperatingSystemMXBean class cast exception when loaded under IBM JVM

2018-08-21 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6702:

Fix Version/s: 1.15.0

> OperatingSystemMXBean class cast exception when loaded under IBM JVM
> 
>
> Key: DRILL-6702
> URL: https://issues.apache.org/jira/browse/DRILL-6702
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Rob Wu
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.15.0
>
>
> Related to: https://issues.apache.org/jira/browse/DRILL-6289
>  
> [https://github.com/apache/drill/blob/1.14.0/common/src/main/java/org/apache/drill/exec/metrics/CpuGaugeSet.java#L28|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_1.14.0_common_src_main_java_org_apache_drill_exec_metrics_CpuGaugeSet.java-23L28=DwMFAg=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6lpT_XkmYy7yg3A=f8a5MyR85-7Ns3KmymU7PI8Sk6qW8vRa9HJIa0-npNA=mpztPtwrTzNkgLcUORZdl5LQ6gyP5iAf3umFzgdOMeI=]
>  
> Exception in thread "main" java.lang.ExceptionInInitializerError
>     at java.lang.J9VMInternals.ensureError(J9VMInternals.java:141)
>     at 
> java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:130)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics.getRegistry(DrillMetrics.java:111)
>     at 
> org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:64)
>     at 
> org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:48)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:45)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:40)
>     ...
> Caused by: java.lang.ClassCastException: 
> com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
> com.sun.management.OperatingSystemMXBean
>     at org.apache.drill.exec.metrics.CpuGaugeSet.(CpuGaugeSet.java:40)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.registerSystemMetrics(DrillMetrics.java:63)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.(DrillMetrics.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-08-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6461:
-
Labels: ready-to-commit  (was: )

> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Affects Versions: 1.14.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-08-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6461:
-
Fix Version/s: 1.15.0

> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Affects Versions: 1.14.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-08-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6461:
-
Component/s: Tools, Build & Test

> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Affects Versions: 1.14.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-08-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6461:
-
Affects Version/s: 1.14.0

> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Affects Versions: 1.14.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6702) OperatingSystemMXBean class cast exception when loaded under IBM JVM

2018-08-21 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587868#comment-16587868
 ] 

Kunal Khatua commented on DRILL-6702:
-

[~robertw] do you want to review this PR and try it out?

> OperatingSystemMXBean class cast exception when loaded under IBM JVM
> 
>
> Key: DRILL-6702
> URL: https://issues.apache.org/jira/browse/DRILL-6702
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Rob Wu
>Assignee: Kunal Khatua
>Priority: Minor
>
> Related to: https://issues.apache.org/jira/browse/DRILL-6289
>  
> [https://github.com/apache/drill/blob/1.14.0/common/src/main/java/org/apache/drill/exec/metrics/CpuGaugeSet.java#L28|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_1.14.0_common_src_main_java_org_apache_drill_exec_metrics_CpuGaugeSet.java-23L28=DwMFAg=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6lpT_XkmYy7yg3A=f8a5MyR85-7Ns3KmymU7PI8Sk6qW8vRa9HJIa0-npNA=mpztPtwrTzNkgLcUORZdl5LQ6gyP5iAf3umFzgdOMeI=]
>  
> Exception in thread "main" java.lang.ExceptionInInitializerError
>     at java.lang.J9VMInternals.ensureError(J9VMInternals.java:141)
>     at 
> java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:130)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics.getRegistry(DrillMetrics.java:111)
>     at 
> org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:64)
>     at 
> org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:48)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:45)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:40)
>     ...
> Caused by: java.lang.ClassCastException: 
> com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
> com.sun.management.OperatingSystemMXBean
>     at org.apache.drill.exec.metrics.CpuGaugeSet.(CpuGaugeSet.java:40)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.registerSystemMetrics(DrillMetrics.java:63)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.(DrillMetrics.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6702) OperatingSystemMXBean class cast exception when loaded under IBM JVM

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587860#comment-16587860
 ] 

ASF GitHub Bot commented on DRILL-6702:
---

kkhatua opened a new pull request #1437: DRILL-6702: Disable CPU Reporting for 
non-HotSpot JDKs
URL: https://github.com/apache/drill/pull/1437
 
 
   When running Drill on the IBM JDK (J9), the webUI throws an 
ClassCastException :
   Caused by: java.lang.ClassCastException: 
com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
com.sun.management.OperatingSystemMXBean
   
   This PR simply disables that, since Drill should ideally be recompiled with 
these alternative JDKs. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OperatingSystemMXBean class cast exception when loaded under IBM JVM
> 
>
> Key: DRILL-6702
> URL: https://issues.apache.org/jira/browse/DRILL-6702
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Rob Wu
>Assignee: Kunal Khatua
>Priority: Minor
>
> Related to: https://issues.apache.org/jira/browse/DRILL-6289
>  
> [https://github.com/apache/drill/blob/1.14.0/common/src/main/java/org/apache/drill/exec/metrics/CpuGaugeSet.java#L28|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_1.14.0_common_src_main_java_org_apache_drill_exec_metrics_CpuGaugeSet.java-23L28=DwMFAg=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6lpT_XkmYy7yg3A=f8a5MyR85-7Ns3KmymU7PI8Sk6qW8vRa9HJIa0-npNA=mpztPtwrTzNkgLcUORZdl5LQ6gyP5iAf3umFzgdOMeI=]
>  
> Exception in thread "main" java.lang.ExceptionInInitializerError
>     at java.lang.J9VMInternals.ensureError(J9VMInternals.java:141)
>     at 
> java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:130)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics.getRegistry(DrillMetrics.java:111)
>     at 
> org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:64)
>     at 
> org.apache.drill.exec.memory.BaseAllocator.(BaseAllocator.java:48)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:45)
>     at 
> org.apache.drill.exec.memory.RootAllocatorFactory.newRoot(RootAllocatorFactory.java:40)
>     ...
> Caused by: java.lang.ClassCastException: 
> com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
> com.sun.management.OperatingSystemMXBean
>     at org.apache.drill.exec.metrics.CpuGaugeSet.(CpuGaugeSet.java:40)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.registerSystemMetrics(DrillMetrics.java:63)
>     at 
> org.apache.drill.exec.metrics.DrillMetrics$RegistryHolder.(DrillMetrics.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1

2018-08-21 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6569:


Assignee: Vitalii Diravka  (was: Pritesh Maker)

> Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not 
> read value at 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> --
>
> Key: DRILL-6569
> URL: https://issues.apache.org/jira/browse/DRILL-6569
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 19.
> I am able to scan the parquet file using:
>select * from 
> dfs.`/drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet`
> and I get 3,349,279 rows selected.
> There are roughly 15 similar failures in the Advanced nightly run, out of 37 
> failures.  So this issue accounts for about half the failures.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql
> SELECT i_brand_id  brand_id,
> i_brand brand,
> i_manufact_id,
> i_manufact,
> Sum(ss_ext_sales_price) ext_price
> FROM   date_dim,
> store_sales,
> item,
> customer,
> customer_address,
> store
> WHERE  d_date_sk = ss_sold_date_sk
> AND ss_item_sk = i_item_sk
> AND i_manager_id = 38
> AND d_moy = 12
> AND d_year = 1998
> AND ss_customer_sk = c_customer_sk
> AND c_current_addr_sk = ca_address_sk
> AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
> AND ss_store_sk = s_store_sk
> GROUP  BY i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> ORDER  BY ext_price DESC,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 
> 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> Fragment 4:26
> [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]
>   (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 
> 2 in block 0 in file 
> maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
> 
> hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
> hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199
> 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57
> 
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
> org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
> org.apache.drill.exec.physical.impl.ScanBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 

[jira] [Updated] (DRILL-6688) Data batches for Project operator exceed the maximum specified

2018-08-21 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6688:
-
Reviewer: Boaz Ben-Zvi

> Data batches for Project operator exceed the maximum specified
> --
>
> Key: DRILL-6688
> URL: https://issues.apache.org/jira/browse/DRILL-6688
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.15.0
>
>
> I ran this query:
> alter session set `drill.exec.memory.operator.project.output_batch_size` = 
> 131072;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select
> chr(101) CharacterValuea,
> chr(102) CharacterValueb,
> chr(103) CharacterValuec,
> chr(104) CharacterValued,
> chr(105) CharacterValuee
> from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`;
> The output has 1024 identical lines:
> e f g h i
> There is one incoming batch:
> 2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG 
> o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:
> { Records: 6, Total size: 0, Data size: 30, Gross row width: 0, Net 
> row width: 5, Density: 0% }
> Batch schema & sizes:
> { `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 6, Per entry: std 
> data size: 4, std net size: 5, actual data size: 4, actual net size: 5 
> Totals: data size: 24, net size: 30) }
> }
> There are four outgoing batches. All are too large. The first three look like 
> this:
> 2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG 
> o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size:
> { Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net 
> row width: 25, Density: 0% }
> Batch schema & sizes:
> { CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
> size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
> data size: 16383, net size: 81915) }
> CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
> size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
> data size: 16383, net size: 81915) }
> CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
> size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
> data size: 16383, net size: 81915) }
> CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
> size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
> data size: 16383, net size: 81915) }
> CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
> size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
> data size: 16383, net size: 81915) }
> }
> The last batch is smaller because it has the remaining records.
> The data size (409575) exceeds the maximum batch size (131072).
> character415.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587821#comment-16587821
 ] 

ASF GitHub Bot commented on DRILL-6422:
---

vrozov commented on issue #1264:  DRILL-6422: Update guava to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#issuecomment-414769879
 
 
   @vvysotskyi Correct, but note that with a single PR it is necessary to 
complete almost the same actions by a single committer in a short period of 
time (due to merge conflicts) while two PRs allow avoiding such dependency and 
one person can merge the first PR, another person can publish artifacts to 
maven and a third person can wait for the artifacts to become available before 
resolving conflicts and merging the second PR.
   
   To manually publish artifacts, see https://repository.apache.org or use 
`maven deploy`. I don't remember who (all committers or only PMCs) are allowed 
to publish to the Apache maven repository.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update guava to 23.0 and shade it
> -
>
> Key: DRILL-6422
> URL: https://issues.apache.org/jira/browse/DRILL-6422
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.15.0
>
>
> Some hadoop libraries use old versions of guava and most of them are 
> incompatible with guava 23.0.
> To allow usage of new guava version, it should be shaded and shaded version 
> should be used in the project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587761#comment-16587761
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

arina-ielchiieva edited a comment on issue #1334: DRILL-6385: Support JPPD 
feature
URL: https://github.com/apache/drill/pull/1334#issuecomment-414750379
 
 
   @weijietong please rebase and resolve the conflicts (you would need to 
regenerate the protobufs).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587759#comment-16587759
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

arina-ielchiieva commented on issue #1334: DRILL-6385: Support JPPD feature
URL: https://github.com/apache/drill/pull/1334#issuecomment-414750379
 
 
   @weijietong please rebase and resolve the conflicts (would you need to 
regenerate the protobufs).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6179) Added pcapng-format support

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587756#comment-16587756
 ] 

ASF GitHub Bot commented on DRILL-6179:
---

arina-ielchiieva closed pull request #1126: DRILL-6179: Added pcapng-format 
support
URL: https://github.com/apache/drill/pull/1126
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/contrib/native/client/src/protobuf/UserBitShared.pb.cc 
b/contrib/native/client/src/protobuf/UserBitShared.pb.cc
index 739804844bf..8f98e06a0a2 100644
--- a/contrib/native/client/src/protobuf/UserBitShared.pb.cc
+++ b/contrib/native/client/src/protobuf/UserBitShared.pb.cc
@@ -750,7 +750,7 @@ void protobuf_AddDesc_UserBitShared_2eproto() {
 "TATEMENT\020\005*\207\001\n\rFragmentState\022\013\n\007SENDING\020"
 
"\000\022\027\n\023AWAITING_ALLOCATION\020\001\022\013\n\007RUNNING\020\002\022"
 
"\014\n\010FINISHED\020\003\022\r\n\tCANCELLED\020\004\022\n\n\006FAILED\020\005"
-"\022\032\n\026CANCELLATION_REQUESTED\020\006*\316\010\n\020CoreOpe"
+"\022\032\n\026CANCELLATION_REQUESTED\020\006*\343\010\n\020CoreOpe"
 "ratorType\022\021\n\rSINGLE_SENDER\020\000\022\024\n\020BROADCAS"
 "T_SENDER\020\001\022\n\n\006FILTER\020\002\022\022\n\016HASH_AGGREGATE"
 
"\020\003\022\r\n\tHASH_JOIN\020\004\022\016\n\nMERGE_JOIN\020\005\022\031\n\025HAS"
@@ -778,11 +778,11 @@ void protobuf_AddDesc_UserBitShared_2eproto() {
 "ER\0200\022\026\n\022OPEN_TSDB_SUB_SCAN\0201\022\017\n\013JSON_WRI"
 "TER\0202\022\026\n\022HTPPD_LOG_SUB_SCAN\0203\022\022\n\016IMAGE_S"
 "UB_SCAN\0204\022\025\n\021SEQUENCE_SUB_SCAN\0205\022\023\n\017PART"
-"ITION_LIMIT\0206*g\n\nSaslStatus\022\020\n\014SASL_UNKN"
-"OWN\020\000\022\016\n\nSASL_START\020\001\022\024\n\020SASL_IN_PROGRES"
-
"S\020\002\022\020\n\014SASL_SUCCESS\020\003\022\017\n\013SASL_FAILED\020\004B."
-"\n\033org.apache.drill.exec.protoB\rUserBitSh"
-"aredH\001", 5406);
+"ITION_LIMIT\0206\022\023\n\017PCAPNG_SUB_SCAN\0207*g\n\nSa"
+"slStatus\022\020\n\014SASL_UNKNOWN\020\000\022\016\n\nSASL_START"
+"\020\001\022\024\n\020SASL_IN_PROGRESS\020\002\022\020\n\014SASL_SUCCESS"
+"\020\003\022\017\n\013SASL_FAILED\020\004B.\n\033org.apache.drill."
+"exec.protoB\rUserBitSharedH\001", 5427);
   ::google::protobuf::MessageFactory::InternalRegisterGeneratedFile(
 "UserBitShared.proto", _RegisterTypes);
   UserCredentials::default_instance_ = new UserCredentials();
@@ -958,6 +958,7 @@ bool CoreOperatorType_IsValid(int value) {
 case 52:
 case 53:
 case 54:
+case 55:
   return true;
 default:
   return false;
diff --git a/contrib/native/client/src/protobuf/UserBitShared.pb.h 
b/contrib/native/client/src/protobuf/UserBitShared.pb.h
index 4599abb23aa..a07cbfa67e8 100644
--- a/contrib/native/client/src/protobuf/UserBitShared.pb.h
+++ b/contrib/native/client/src/protobuf/UserBitShared.pb.h
@@ -258,11 +258,12 @@ enum CoreOperatorType {
   HTPPD_LOG_SUB_SCAN = 51,
   IMAGE_SUB_SCAN = 52,
   SEQUENCE_SUB_SCAN = 53,
-  PARTITION_LIMIT = 54
+  PARTITION_LIMIT = 54,
+  PCAPNG_SUB_SCAN = 55
 };
 bool CoreOperatorType_IsValid(int value);
 const CoreOperatorType CoreOperatorType_MIN = SINGLE_SENDER;
-const CoreOperatorType CoreOperatorType_MAX = PARTITION_LIMIT;
+const CoreOperatorType CoreOperatorType_MAX = PCAPNG_SUB_SCAN;
 const int CoreOperatorType_ARRAYSIZE = CoreOperatorType_MAX + 1;
 
 const ::google::protobuf::EnumDescriptor* CoreOperatorType_descriptor();
diff --git a/exec/java-exec/pom.xml b/exec/java-exec/pom.xml
index f175c654c01..f4068952ee5 100644
--- a/exec/java-exec/pom.xml
+++ b/exec/java-exec/pom.xml
@@ -534,6 +534,11 @@
   metadata-extractor
   2.11.0
 
+
+  fr.bmartel
+  pcapngdecoder
+  1.2
+
   
 
   
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java
index 9cc98de9c44..a0a07a99d11 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java
@@ -42,18 +42,18 @@
   private long timestamp;
   private int originalLength;
 
-  private byte[] raw;
+  protected byte[] raw;
 
   // index into the raw data where the current ethernet packet starts
   private int etherOffset;
   // index into the raw data where the current IP packet starts. Should be 
just after etherOffset
-  private int ipOffset;
+  protected int ipOffset;
 
   private int packetLength;
-  private int etherProtocol;
-  private int protocol;
+  protected int etherProtocol;
+  protected 

[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587659#comment-16587659
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

arjun-rajan removed a comment on issue #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#issuecomment-414729323
 
 
   @ilooner  I have made the changes as per the review comment. Could you 
please review. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587660#comment-16587660
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

kr-arjun commented on issue #1405: DRILL-6640: Modifying DotDrillUtil 
implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#issuecomment-414729647
 
 
   @ilooner  I have made the changes as per the review comment. Could you 
please review. Thanks!
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587657#comment-16587657
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

arjun-rajan commented on issue #1405: DRILL-6640: Modifying DotDrillUtil 
implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#issuecomment-414729323
 
 
   @ilooner  I have made the changes as per the review comment. Could you 
please review. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6179) Added pcapng-format support

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587635#comment-16587635
 ] 

ASF GitHub Bot commented on DRILL-6179:
---

arina-ielchiieva commented on issue #1126: DRILL-6179: Added pcapng-format 
support
URL: https://github.com/apache/drill/pull/1126#issuecomment-414727675
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Added pcapng-format support
> ---
>
> Key: DRILL-6179
> URL: https://issues.apache.org/jira/browse/DRILL-6179
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.13.0
>Reporter: Vlad
>Assignee: Vlad
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> The _PCAP Next Generation Dump File Format_ (or pcapng for short) [1] is an 
> attempt to overcome the limitations of the currently widely used (but 
> limited) libpcap format.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port, and src/dest mac addreses or by protocol. Beyond 
> that, however, it would be very useful to be able to group packets by TCP 
> session and eventually to look at packet contents.
> Initial work is available at  
> https://github.com/mapr-demos/drill/tree/pcapng_dev
> [1] https://pcapng.github.io/pcapng/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6179) Added pcapng-format support

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587611#comment-16587611
 ] 

ASF GitHub Bot commented on DRILL-6179:
---

Vlad-Storona commented on issue #1126: DRILL-6179: Added pcapng-format support
URL: https://github.com/apache/drill/pull/1126#issuecomment-414721363
 
 
   @arina-ielchiieva thanks for comments, I fixed issues and rebased to the 
latest master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Added pcapng-format support
> ---
>
> Key: DRILL-6179
> URL: https://issues.apache.org/jira/browse/DRILL-6179
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.13.0
>Reporter: Vlad
>Assignee: Vlad
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> The _PCAP Next Generation Dump File Format_ (or pcapng for short) [1] is an 
> attempt to overcome the limitations of the currently widely used (but 
> limited) libpcap format.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port, and src/dest mac addreses or by protocol. Beyond 
> that, however, it would be very useful to be able to group packets by TCP 
> session and eventually to look at packet contents.
> Initial work is available at  
> https://github.com/mapr-demos/drill/tree/pcapng_dev
> [1] https://pcapng.github.io/pcapng/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6179) Added pcapng-format support

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587607#comment-16587607
 ] 

ASF GitHub Bot commented on DRILL-6179:
---

Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r211656129
 
 

 ##
 File path: protocol/src/main/protobuf/UserBitShared.proto
 ##
 @@ -343,6 +343,7 @@ enum CoreOperatorType {
   IMAGE_SUB_SCAN = 52;
   SEQUENCE_SUB_SCAN = 53;
   PARTITION_LIMIT = 54;
+  PCAPNG_SUB_SCAN = 55;
 
 Review comment:
   Sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Added pcapng-format support
> ---
>
> Key: DRILL-6179
> URL: https://issues.apache.org/jira/browse/DRILL-6179
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.13.0
>Reporter: Vlad
>Assignee: Vlad
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> The _PCAP Next Generation Dump File Format_ (or pcapng for short) [1] is an 
> attempt to overcome the limitations of the currently widely used (but 
> limited) libpcap format.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port, and src/dest mac addreses or by protocol. Beyond 
> that, however, it would be very useful to be able to group packets by TCP 
> session and eventually to look at packet contents.
> Initial work is available at  
> https://github.com/mapr-demos/drill/tree/pcapng_dev
> [1] https://pcapng.github.io/pcapng/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6179) Added pcapng-format support

2018-08-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587605#comment-16587605
 ] 

ASF GitHub Bot commented on DRILL-6179:
---

Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r211655692
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngFormatPlugin.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+
+import java.util.List;
+
+public class PcapngFormatPlugin extends EasyFormatPlugin {
+
+  public static final String DEFAULT_NAME = "pcapng";
+
+  public PcapngFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf,
+StoragePluginConfig storagePluginConfig) {
+this(name, context, fsConf, storagePluginConfig, new PcapngFormatConfig());
+  }
+
+  public PcapngFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf, StoragePluginConfig config, PcapngFormatConfig 
formatPluginConfig) {
+super(name, context, fsConf, config, formatPluginConfig, true,
+false, true, false,
+formatPluginConfig.getExtensions(), DEFAULT_NAME);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs,
+  FileWork fileWork, List 
columns,
+  String userName) {
+return new PcapngRecordReader(fileWork.getPath(), dfs, columns);
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) {
+throw new UnsupportedOperationException("unimplemented");
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return UserBitShared.CoreOperatorType.PCAPNG_SUB_SCAN_VALUE;
+  }
+
+  @Override
+  public int getWriterOperatorType() {
+return 0;
 
 Review comment:
   Okay, changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Added pcapng-format support
> ---
>
> Key: DRILL-6179
> URL: https://issues.apache.org/jira/browse/DRILL-6179
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.13.0
>Reporter: Vlad
>Assignee: Vlad
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> The _PCAP Next Generation Dump File Format_ (or pcapng for short) [1] is an 
> attempt to overcome the limitations of the currently widely used (but 
> limited) libpcap format.
> At a first level, it is desirable to query and filter by source and 
> destination IP and port, and src/dest mac addreses or by protocol. Beyond 
> that, however, it would be very useful to be able to group packets by TCP 
> session and eventually to look at packet contents.
> Initial work is available at  
> https://github.com/mapr-demos/drill/tree/pcapng_dev
> [1] https://pcapng.github.io/pcapng/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6639) Exception happens while displaying operator profiles for some queries

2018-08-21 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-6639.
---

Verified with Drill 1.15.0-SNAPSHOT (commit 
8ddc9d79e0d298e74f2256328dd9ddee06a20066)
The issue is not reproduced with the same steps, also checked:
- Different filters and combinations
- Limit
- Union
- Agg functions as count
- Explain plan
- Was able to rerun every query through the Web UI

> Exception happens while displaying operator profiles for some queries 
> --
>
> Key: DRILL-6639
> URL: https://issues.apache.org/jira/browse/DRILL-6639
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> *Prerequisites:*
> *1.* Create a MapR-DB JSON table:
> {noformat}
> hadoop fs -mkdir /tmp/mdb_tabl
> mapr dbshell
> create /tmp/mdb_table/json
> insert /tmp/mdb_table/json --value '{"_id":"movie002" , 
> "title":"Developers on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/mdb_table/json --id movie003 --value '{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {noformat}
> *2.* Create a Hive external table:
> {noformat}
> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> movie_id string, title string, studio string) 
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> TBLPROPERTIES("maprdb.table.name" = "/tmp/mdb_table/json","maprdb.column.id" 
> = "movie_id");
> {noformat}
> *3.* Enable Hive storage plugin in Drill:
> {code:json}
> {
>   "type": "hive",
>   "enabled": true,
>   "configProps": {
>   "hive.metastore.uris": "thrift://localhost:9083",
>   "fs.default.name": "maprfs:///",
>   "hive.metastore.sasl.enabled": "false"
>   }
> }
> {code}
> *Steps:*
> *1.* Run the following query:
> {noformat}
> select * from hive.`mapr_db_json_hive_tbl`
> {noformat}
> *2.* Open the query profile in the Drill UI, look at the Operator Profiles
> *Expected result:*
> Operator Profiles should be displayed
> *Actual result:*
> Exception displayed:
> {code}
> FreeMarker template error (DEBUG mode; use RETHROW in production!): Java 
> method 
> "org.apache.drill.exec.server.rest.profile.ProfileWrapper.getOperatorsOverview()"
>  threw an exception when invoked on 
> org.apache.drill.exec.server.rest.profile.ProfileWrapper object 
> "org.apache.drill.exec.server.rest.profile.ProfileWrapper@36c94e5"; see cause 
> exception in the Java stack trace.  FTL stack trace ("~" means 
> nesting-related): - Failed at: ${model.getOperatorsOverview()?no_esc} [in 
> template "rest/profile/profile.ftl" in macro "page_body" at line 338, column 
> 11] - Reached through: @page_body [in template "rest/generic.ftl" in macro 
> "page_html" at line 99, column 9] - Reached through: @page_html [in template 
> "rest/profile/profile.ftl" at line 474, column 1]  Java stack trace (for 
> programmers):  freemarker.core._TemplateModelException: [... Exception 
> message was already printed; see it above ...] at 
> freemarker.ext.beans._MethodUtil.newInvocationTemplateModelException(_MethodUtil.java:289)
>  at 
> freemarker.ext.beans._MethodUtil.newInvocationTemplateModelException(_MethodUtil.java:252)
>  at freemarker.ext.beans.SimpleMethodModel.exec(SimpleMethodModel.java:74) at 
> freemarker.core.MethodCall._eval(MethodCall.java:65) at 
> freemarker.core.Expression.eval(Expression.java:81) at 
> freemarker.core.BuiltInsForOutputFormatRelated$AbstractConverterBI.calculateResult(BuiltInsForOutputFormatRelated.java:50)
>  at 
> freemarker.core.MarkupOutputFormatBoundBuiltIn._eval(MarkupOutputFormatBoundBuiltIn.java:40)
>  at freemarker.core.Expression.eval(Expression.java:81) at 
> freemarker.core.DollarVariable.calculateInterpolatedStringOrMarkup(DollarVariable.java:96)
>  at freemarker.core.DollarVariable.accept(DollarVariable.java:59) at 
> freemarker.core.Environment.visit(Environment.java:362) at 
> freemarker.core.Environment.invoke(Environment.java:714) at 
> freemarker.core.UnifiedCall.accept(UnifiedCall.java:83) at 
> freemarker.core.Environment.visit(Environment.java:362) at 
> freemarker.core.Environment.invoke(Environment.java:714) at 
> freemarker.core.UnifiedCall.accept(UnifiedCall.java:83) at 
> freemarker.core.Environment.visit(Environment.java:326) at 
> freemarker.core.Environment.visit(Environment.java:332) at 
> freemarker.core.Environment.process(Environment.java:305) at 
> freemarker.template.Template.process(Template.java:378) at 
> org.glassfish.jersey.server.mvc.freemarker.FreemarkerViewProcessor.writeTo(FreemarkerViewProcessor.java:143)
>  at 
> org.glassfish.jersey.server.mvc.freemarker.FreemarkerViewProcessor.writeTo(FreemarkerViewProcessor.java:85)
>  at 

[jira] [Closed] (DRILL-6696) IOBE in Operator Metric Registry

2018-08-21 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-6696.
---

Verified with Drill 1.15.0-SNAPSHOT (commit 
b4ad0f811df2f729849aa42c184abe7886c8b9c6)

> IOBE in Operator Metric Registry
> 
>
> Key: DRILL-6696
> URL: https://issues.apache.org/jira/browse/DRILL-6696
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> There is an issue in _{{OperatorMetricRegistry}}_, OPERATOR_METRICS 
> two-dimensional array doesn't check the index of element. If operator type 
> value isn't a number in order - _{{ArrayIndexOutOfBoundsException}}_ will 
> happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"

2018-08-21 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587337#comment-16587337
 ] 

Vitalii Diravka commented on DRILL-6552:


[~rhou] Drill will have an API and will have several implementations of 
Metastore. One of them will be HMS. Indeed, it isn't scalable solution. But HMS 
is used by Hive, Spark, Presto and so on. So Drill should have HMS as one of 
the implementations too. It could be the first implementation, but possibly not 
the main one.
Also there is a task in progress - HIVE-9452: Use HBase to store Hive metadata, 
which can solve the scaling issue. 

> Drill Metadata management "Drill MetaStore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)