[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501105#comment-16501105
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

gparai commented on a change in pull request #1303: DRILL-6463 : Fix integer 
overflow in MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303#discussion_r192921366
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ##
 @@ -121,9 +121,9 @@ public void onMatch(RelOptRuleCall call) {
 
 final RelDataType scanRowType = constructDataType(agg, result.keySet());
 
-final DynamicPojoRecordReader reader = new DynamicPojoRecordReader<>(
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
 
 Review comment:
   Yes, agree. Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501104#comment-16501104
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

vrozov commented on issue #1303: DRILL-6463 : Fix integer overflow in 
MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303#issuecomment-394547385
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501097#comment-16501097
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

vrozov commented on a change in pull request #1303: DRILL-6463 : Fix integer 
overflow in MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303#discussion_r192920347
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ##
 @@ -121,9 +121,9 @@ public void onMatch(RelOptRuleCall call) {
 
 final RelDataType scanRowType = constructDataType(agg, result.keySet());
 
-final DynamicPojoRecordReader reader = new DynamicPojoRecordReader<>(
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
 
 Review comment:
   This will change return type for "select count(*) ...", let's keep it to 
`Long`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501084#comment-16501084
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

gparai commented on issue #1303: DRILL-6463 : Fix integer overflow in 
MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303#issuecomment-394543655
 
 
   @Ben-Zvi the Calcite interface RelOptCost uses doubles for rows, io, cpu. 
This translates to DrillCostBase ending up using doubles. ScanStats feeds into 
DrillCostBase, hence the rationale of using doubles. One reason for using 
doubles is rowcounts are often estimated and end up as not being integers. I 
guess doubles were used to keep the Calcite cost model as accurate as possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501083#comment-16501083
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

gparai commented on issue #1303: DRILL-6463 : Fix integer overflow in 
MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303#issuecomment-394543655
 
 
   @Ben-Zvi the Calcite interface RelOptCost uses doubles for rows, io, cpu. 
This translates to DrillCostBase ending up using doubles. ScanStats feeds into 
DrillCostBase, hence the rationale of using doubles. One reason for using 
doubles is rowcounts are often estimated and end up as not being integers. I 
guess doubles were used to keep the cost model as accurate as possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6463:
-
Reviewer: Boaz Ben-Zvi

> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6455:
-
Labels: ready-to-commit  (was: )

> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-4020:


Assignee: Akihiko Kusanagi

> The not-equal operator returns incorrect results when used on the HBase row 
> key
> ---
>
> Key: DRILL-4020
> URL: https://issues.apache.org/jira/browse/DRILL-4020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
> Environment: Drill Sandbox
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Critical
>
> Create a test HBase table:
> {noformat}
> hbase> create 'table', 'f'
> hbase> put 'table', 'row1', 'f:c', 'value1'
> hbase> put 'table', 'row2', 'f:c', 'value2'
> hbase> put 'table', 'row3', 'f:c', 'value3'
> {noformat}
> The table looks like this:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table`;
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (4.596 seconds)
> {noformat}
> However, this query returns incorrect results when a not-equal operator is 
> used on the row key:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1';
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (0.573 seconds)
> {noformat}
> In the query plan, there is no RowFilter:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]])
> {noformat}
> When the query has multiple not-equal operators, it works fine:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2';
> +-+
> | EXPR$0  |
> +-+
> | row3|
> +-+
> 1 row selected (0.255 seconds)
> {noformat}
> In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): 
> [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], 
> columns=[`row_key`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4364) Image Metadata Format Plugin

2018-06-04 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501067#comment-16501067
 ] 

Pritesh Maker commented on DRILL-4364:
--

Committed to apache drill 04a532d2d8790d69214adbb4a8247f8a382cfd08 by [~parthc]

> Image Metadata Format Plugin
> 
>
> Key: DRILL-4364
> URL: https://issues.apache.org/jira/browse/DRILL-4364
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.14.0
>
>
> Support querying of metadata in various image formats. This plugin leverages 
> [metadata-extractor|https://github.com/drewnoakes/metadata-extractor]. This 
> plugin is especially useful when querying on a large number of image files 
> stored in a distributed file system without building metadata repository in 
> advance.
> This plugin supports the following file formats.
>  * JPEG, TIFF, PSD, PNG, BMP, GIF, ICO, PCX, WAV, AVI, WebP, MOV, MP4, EPS
>  * Camera Raw: ARW (Sony), CRW/CR2 (Canon), NEF (Nikon), ORF (Olympus), RAF 
> (FujiFilm), RW2 (Panasonic), RWL (Leica), SRW (Samsung), X3F (Foveon)
> This plugin enables to read the following metadata.
>  * Exif, IPTC, XMP, JFIF / JFXX, ICC Profiles, Photoshop fields, PNG 
> properties, BMP properties, GIF properties, ICO properties, PCX properties, 
> WAV properties, AVI properties, WebP properties, QuickTime properties, MP4 
> properties, EPS properties
> Since each type of metadata has a different set of fields, the plugin returns 
> a set of commonly-used fields such as the image width, height and bits per 
> pixels for ease of use.
> *Examples:*
> Querying on a JPEG file with the property descriptive: true
> {noformat}
> 0: jdbc:drill:zk=local> select FileName, * from 
> dfs.`4349313028_f69ffa0257_o.jpg`;
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | FileName | FileSize | FileDateTime | Format | PixelWidth | PixelHeight | 
> BitsPerPixel | DPIWidth | DPIHeight | Orientaion | ColorMode | HasAlpha | 
> Duration | VideoCodec | FrameRate | AudioCodec | AudioSampleSize | 
> AudioSampleRate | JPEG | JFIF | ExifIFD0 | ExifSubIFD | Interoperability | 
> GPS | ExifThumbnail | Photoshop | IPTC | Huffman | FileType |
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | 4349313028_f69ffa0257_o.jpg | 257213 bytes | Fri Mar 09 12:09:34 +08:00 
> 2018 | JPEG | 1199 | 800 | 24 | 96 | 96 | Unknown (0) | RGB | false | 
> 00:00:00 | Unknown | 0 | Unknown | 0 | 0 | 
> {"CompressionType":"Baseline","DataPrecision":"8 bits","ImageHeight":"800 
> pixels","ImageWidth":"1199 pixels","NumberOfComponents":"3","Component1":"Y 
> component: Quantization table 0, Sampling factors 2 horiz/2 
> vert","Component2":"Cb component: Quantization table 1, Sampling factors 1 
> horiz/1 vert","Component3":"Cr component: Quantization table 1, Sampling 
> factors 1 horiz/1 vert"} | 
> {"Version":"1.1","ResolutionUnits":"inch","XResolution":"96 
> dots","YResolution":"96 
> dots","ThumbnailWidthPixels":"0","ThumbnailHeightPixels":"0"} | 
> {"Software":"Picasa 3.0"} | 
> {"ExifVersion":"2.10","UniqueImageID":"d65e93b836d15a0c5e041e6b7258c76e"} | 
> {"InteroperabilityIndex":"Unknown ()","InteroperabilityVersion":"1.00"} | 
> {"GPSVersionID":".022","GPSLatitudeRef":"N","GPSLatitude":"47° 32' 
> 15.98\"","GPSLongitudeRef":"W","GPSLongitude":"-122° 2' 
> 6.37\"","GPSAltitudeRef":"Sea level","GPSAltitude":"0 metres"} | 
> {"Compression":"JPEG (old-style)","XResolution":"72 dots per 
> inch","YResolution":"72 dots per 
> inch","ResolutionUnit":"Inch","ThumbnailOffset":"414 
> bytes","ThumbnailLength":"7213 bytes"} | {} | 
> {"Keywords":"135;2002;issaquah;police car;wa;washington"} | 
> {"NumberOfTables":"4 Huffman tables"} | 
> {"DetectedFileTypeName":"JPEG","DetectedFileTypeLongName":"Joint Photographic 
> Experts 
> Group","DetectedMIMEType":"image/jpeg","ExpectedFileNameExtension":"jpg"} |
> 

[jira] [Closed] (DRILL-6056) Mock datasize could overflow to negative

2018-06-04 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai closed DRILL-6056.
-
Resolution: Duplicate

> Mock datasize could overflow to negative
> 
>
> Key: DRILL-6056
> URL: https://issues.apache.org/jira/browse/DRILL-6056
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Chunhui Shi
>Priority: Major
>
> In some cases, mock datasize (rowCount * rowWidth) could be too large, 
> especially when we test spilling or memory OOB exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501062#comment-16501062
 ] 

ASF GitHub Bot commented on DRILL-6463:
---

gparai opened a new pull request #1303: DRILL-6463 : Fix integer overflow in 
MockGroupScanPOP
URL: https://github.com/apache/drill/pull/1303
 
 
   @vrozov @Ben-Zvi please review the PR. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProfileParser cannot parse costs when using MockScanBatch
> -
>
> Key: DRILL-6463
> URL: https://issues.apache.org/jira/browse/DRILL-6463
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
> although the issue is generic. It happens due to cost being stored in an int 
> which overflows with big enough rows/data size and becomes negative. This 
> causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6293) Unable to read hive(2.1.1) tables using Drill 1.13.0

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6293:


Assignee: Vitalii Diravka

> Unable to read hive(2.1.1) tables using Drill 1.13.0 
> -
>
> Key: DRILL-6293
> URL: https://issues.apache.org/jira/browse/DRILL-6293
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anup Tiwari
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: hive-metastore-2.1.1.jar
>
>
> Hi,
> {color:#22}I am not able to read my hive tables in drill 1.13.0 and with 
> same plugin conf it was working in Drill 1.12.0 and 1.10.0.{color}
>   
>  *Hive Plugin :-*
> {code:java}
>  {
>    "type": "hive",
>    "enabled": true,
>    "configProps":
>{
>  "hive.metastore.uris": "thrift://prod-hadoop-1xx.com:9083", 
>   "hive.metastore.sasl.enabled": "false",   
>   "fs.default.name": "hdfs://prod-hadoop-1xx.com:9000"   
>}
> } 
>   {code}
> *Query :-* 
>  select id from hive.cad where log_date = '2018-03-18' limit 3;
>   
>  *Error :-* 
> {code}
> 2018-03-20 14:25:27,351 [254f337f-9ac3-b66f-ed17-1de459da3283:foreman] INFO 
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 254f337f-9ac3-b66f-ed17-1de459da3283: select id from hive.cad where log_date 
> = '2018-03-18' limit 3
>  2018-03-20 14:25:27,354 [254f337f-9ac3-b66f-ed17-1de459da3283:foreman] WARN 
> o.a.d.e.s.h.DrillHiveMetaStoreClient - Failure while attempting to get hive 
> table. Retries once.
>  org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
>   at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111) 
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0]
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) 
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
>  ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
>  ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344)
>  ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient.getHiveReadEntryHelper(DrillHiveMetaStoreClient.java:285)
>  ~[drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$TableLoader.load(DrillHiveMetaStoreClient.java:535)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$TableLoader.load(DrillHiveMetaStoreClient.java:531)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>  [guava-18.0.jar:na]
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) 
> [guava-18.0.jar:na]
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>  [guava-18.0.jar:na]
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) 
> [guava-18.0.jar:na]
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
> [guava-18.0.jar:na]
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
> [guava-18.0.jar:na]
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>  [guava-18.0.jar:na]
>   at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:495)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:213)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:62)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getTable(HiveSchemaFactory.java:201)
>  [drill-storage-hive-core-1.13.0.jar:1.13.0]
>   at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:82)
>  [calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0]
>   at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:257) 
> [calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0]
>   at 
> 

[jira] [Updated] (DRILL-6245) Clicking on anything redirects to main login page

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6245:
-
Fix Version/s: (was: 1.14.0)
   1.15.0

> Clicking on anything redirects to main login page
> -
>
> Key: DRILL-6245
> URL: https://issues.apache.org/jira/browse/DRILL-6245
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.15.0
>
>
> When the Drill Web UI is accessed using https and then by http protocol, the 
> Web UI is always trying to redirect to main login page if anything is clicked 
> on index page. However, this works fine if the cookies are cleared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6340) Output Batch Control in Project using the RecordBatchSizer

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501059#comment-16501059
 ] 

ASF GitHub Bot commented on DRILL-6340:
---

bitblender opened a new pull request #1302: DRILL-6340: Output Batch Control in 
Project using the RecordBatchSizer
URL: https://github.com/apache/drill/pull/1302
 
 
   Changes required to implement Output Batch Sizing in Project using the 
RecordBatchSizer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Output Batch Control in Project using the RecordBatchSizer
> --
>
> Key: DRILL-6340
> URL: https://issues.apache.org/jira/browse/DRILL-6340
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.14.0
>
>
> This bug is for tracking the changes required to implement Output Batch 
> Sizing in Project using the RecordBatchSizer. The challenge in doing this 
> mainly lies in dealing with expressions that produce variable-length columns. 
> The following doc talks about some of the design approaches for dealing with 
> such variable-length columns.
> [https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-06-04 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6076:

Fix Version/s: (was: 1.14.0)
   1.15.0

> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.15.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6147) Limit batch size for Flat Parquet Reader

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6147:
-
Reviewer: Boaz Ben-Zvi  (was: Parth Chandra)

> Limit batch size for Flat Parquet Reader
> 
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5584) When Compiling Apache Drill C++ Client, versioning information are not present in the binary

2018-06-04 Thread Parth Chandra (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra resolved DRILL-5584.
--
   Resolution: Fixed
Fix Version/s: 1.14.0

> When Compiling Apache Drill C++ Client, versioning information are not 
> present in the binary
> 
>
> Key: DRILL-5584
> URL: https://issues.apache.org/jira/browse/DRILL-5584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
> Fix For: 1.14.0
>
>
> We should add support for generating an RC file containing the versioning 
> information so this manual task can be automated.
> Current workaround:
> Compile the C++ Client DLL.
> Open the DLL and manually add a Version Resource with the following 
> information:
> FILEVERSION   1,10,0,0
> PRODUCTVERSION 1,10,0,0
> CompanyName
> FileDescription Apache Drill C++ Client
> FileVersion   1.10.0.0
> InternalNamedrillClient.dll
> LegalCopyright Copyright (c) 2013-2017 The Apache Software 
> Foundation
> OriginalFilename  drillClient.dll
> ProductName   Apache Drill C++ Client
> ProductVersion 1.10.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5850) Problem Querying Directory with Drill

2018-06-04 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501012#comment-16501012
 ] 

Kunal Khatua commented on DRILL-5850:
-

[~jbringley] do you see the issue even when trying to the SQLLine shell?

> Problem Querying Directory with Drill
> -
>
> Key: DRILL-5850
> URL: https://issues.apache.org/jira/browse/DRILL-5850
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
> Environment: Linux, R (3.4)
>Reporter: Joe Bringley
>Priority: Major
>
> I am connecting to Drill through R. When I try to query a directory with CSV 
> files, everything runs fine. However, if the directory consists of JSON 
> files, Drill returns "file not found". Below you'll see that even if one json 
> file exists on the directory, Drill cannot recognize it unless it is 
> explicitly stated.  For example
> ```
> x <- dbGetQuery(conn, "select * from dfs.`/dir/with/json/test.json` LIMIT 5") 
> # works
> y <- dbGetQuery(conn, "select * from dfs.`/dir/with/json/` LIMIT 5") # doesnt 
> work
> Error in .jcall(rp, "I", "fetch", stride, block) : 
>   java.sql.SQLException: DATA_READ ERROR: Failure reading JSON file - File 
> file:/dir/with/json/test.json does not exist
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5924) native-client: Support user-specified CXX_FLAGS

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker resolved DRILL-5924.
--
Resolution: Fixed

> native-client: Support user-specified CXX_FLAGS
> ---
>
> Key: DRILL-5924
> URL: https://issues.apache.org/jira/browse/DRILL-5924
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Trivial
> Fix For: 1.14.0
>
>
> Currently the build process for the native client overrides the CXX_FLAGS 
> supplied by the user. In some cases we need to pass additional flags, e.g. 
> {{-fpermissive}}, to the build to have it succeed. Thus instead of overriding 
> these flags, they should only be expanded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5924) native-client: Support user-specified CXX_FLAGS

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5924:


Assignee: Uwe L. Korn

> native-client: Support user-specified CXX_FLAGS
> ---
>
> Key: DRILL-5924
> URL: https://issues.apache.org/jira/browse/DRILL-5924
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Trivial
> Fix For: 1.14.0
>
>
> Currently the build process for the native client overrides the CXX_FLAGS 
> supplied by the user. In some cases we need to pass additional flags, e.g. 
> {{-fpermissive}}, to the build to have it succeed. Thus instead of overriding 
> these flags, they should only be expanded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5924) native-client: Support user-specified CXX_FLAGS

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5924:
-
Fix Version/s: 1.14.0

> native-client: Support user-specified CXX_FLAGS
> ---
>
> Key: DRILL-5924
> URL: https://issues.apache.org/jira/browse/DRILL-5924
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Reporter: Uwe L. Korn
>Priority: Trivial
> Fix For: 1.14.0
>
>
> Currently the build process for the native client overrides the CXX_FLAGS 
> supplied by the user. In some cases we need to pass additional flags, e.g. 
> {{-fpermissive}}, to the build to have it succeed. Thus instead of overriding 
> these flags, they should only be expanded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500982#comment-16500982
 ] 

ASF GitHub Bot commented on DRILL-6455:
---

amansinha100 commented on a change in pull request #1297: DRILL-6455: Add 
missing JDBC Scan Operator for profiles
URL: https://github.com/apache/drill/pull/1297#discussion_r192901896
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -24327,11 +24336,11 @@ public Builder clearStatus() {
   "$\022\021\n\rPCAP_SUB_SCAN\020%\022\022\n\016KAFKA_SUB_SCAN\020&" +
   
"\022\021\n\rKUDU_SUB_SCAN\020\'\022\013\n\007FLATTEN\020(\022\020\n\014LATE" +
   "RAL_JOIN\020)\022\n\n\006UNNEST\020*\022,\n(HIVE_DRILL_NAT" +
-  "IVE_PARQUET_ROW_GROUP_SCAN\020+*g\n\nSaslStat" +
-  
"us\022\020\n\014SASL_UNKNOWN\020\000\022\016\n\nSASL_START\020\001\022\024\n\020"
 +
-  
"SASL_IN_PROGRESS\020\002\022\020\n\014SASL_SUCCESS\020\003\022\017\n\013" +
-  "SASL_FAILED\020\004B.\n\033org.apache.drill.exec.p" +
-  "rotoB\rUserBitSharedH\001"
+  "IVE_PARQUET_ROW_GROUP_SCAN\020+\022\r\n\tJDBC_SCA" +
 
 Review comment:
   Got it.   Changes lgtm.  +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5700) nohup support for sqlline

2018-06-04 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5700:


Assignee: Arjun

> nohup support for sqlline 
> --
>
> Key: DRILL-5700
> URL: https://issues.apache.org/jira/browse/DRILL-5700
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: Arjun
>Assignee: Arjun
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Sqlline script does support nohup mode for execution. On execution, it 
> remains stopped until it is made fore ground.
> {code:java}
> [mapr@node1 ~]$ cat test.sql
> select * from sys.drillbits
> [mapr@node1 ~]$
> [mapr@node1 ~]$ nohup sqlline -u "jdbc:drill:" -n mapr -p mapr -f test.sql  &
> [1] 24019
> [mapr@node1 ~]$ nohup: ignoring input and appending output to `nohup.out'
> [1]+  Stopped nohup sqlline -u "jdbc:drill:zk=node1:5181" -n 
> mapr -p mapr -f test.sql
> [mapr@node1 ~]$
> [mapr@node1 ~]$ fg
> nohup sqlline -u "jdbc:drill:zk=node1:5181" -n mapr -p mapr -f test.sql
> [mapr@node1 ~]$
> [mapr@node1 ~]$ cat nohup.out
> 0: jdbc:drill:zk=node1:5181> Closing: 
> org.apache.drill.jdbc.impl.DrillConnectionImpl
> output of ps: S
> 1/1  select * from sys.drillbits;
> +--++---++--+--+
> | hostname | user_port  | control_port  | data_port  | current  | 
> version  |
> +--++---++--+--+
> | node1  | 31010  | 31011 | 31012  | true | 1.10.0   |
> +--++---++--+--+
> 1 row selected (0.354 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.10.0
> "drill baby drill"
> [mapr@node1 ~]$
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5584) When Compiling Apache Drill C++ Client, versioning information are not present in the binary

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500979#comment-16500979
 ] 

ASF GitHub Bot commented on DRILL-5584:
---

parthchandra commented on issue #1039: DRILL-5584: Add branding and versioning 
information for windows C++ C…
URL: https://github.com/apache/drill/pull/1039#issuecomment-394524107
 
 
   Committed as 9908ea0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> When Compiling Apache Drill C++ Client, versioning information are not 
> present in the binary
> 
>
> Key: DRILL-5584
> URL: https://issues.apache.org/jira/browse/DRILL-5584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> We should add support for generating an RC file containing the versioning 
> information so this manual task can be automated.
> Current workaround:
> Compile the C++ Client DLL.
> Open the DLL and manually add a Version Resource with the following 
> information:
> FILEVERSION   1,10,0,0
> PRODUCTVERSION 1,10,0,0
> CompanyName
> FileDescription Apache Drill C++ Client
> FileVersion   1.10.0.0
> InternalNamedrillClient.dll
> LegalCopyright Copyright (c) 2013-2017 The Apache Software 
> Foundation
> OriginalFilename  drillClient.dll
> ProductName   Apache Drill C++ Client
> ProductVersion 1.10.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5584) When Compiling Apache Drill C++ Client, versioning information are not present in the binary

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500974#comment-16500974
 ] 

ASF GitHub Bot commented on DRILL-5584:
---

parthchandra closed pull request #1039: DRILL-5584: Add branding and versioning 
information for windows C++ C…
URL: https://github.com/apache/drill/pull/1039
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/contrib/native/client/src/clientlib/CMakeLists.txt 
b/contrib/native/client/src/clientlib/CMakeLists.txt
index 7b9ecc3c0f..c2bb0b738c 100644
--- a/contrib/native/client/src/clientlib/CMakeLists.txt
+++ b/contrib/native/client/src/clientlib/CMakeLists.txt
@@ -16,6 +16,13 @@
 # limitations under the License.
 #
 
+if(MSVC)
+configure_file(
+  ${CMAKE_CURRENT_SOURCE_DIR}/version.rc.in
+  ${CMAKE_CURRENT_BINARY_DIR}/version.rc
+  @ONLY)
+endif()
+
 # Drill Client library
 
 set (CLIENTLIB_SRC_FILES
@@ -52,7 +59,9 @@ set_property(
 if(MSVC)
 set(CMAKE_CXX_FLAGS "/EHsc")
 add_definitions(-DDRILL_CLIENT_EXPORTS -D_SCL_SECURE_NO_WARNINGS)
+add_library(drillClient SHARED ${CLIENTLIB_SRC_FILES} 
${CMAKE_CURRENT_BINARY_DIR}/version.rc)
+else()
+add_library(drillClient SHARED ${CLIENTLIB_SRC_FILES})
 endif()
 
-add_library(drillClient SHARED ${CLIENTLIB_SRC_FILES} )
 target_link_libraries(drillClient ${Boost_LIBRARIES} ${PROTOBUF_LIBRARY} 
${Zookeeper_LIBRARIES} ${SASL_LIBRARIES} ${OPENSSL_LIBRARIES} protomsgs y2038)
diff --git a/contrib/native/client/src/clientlib/env.h.in 
b/contrib/native/client/src/clientlib/env.h.in
index 746a500a42..85cc778420 100644
--- a/contrib/native/client/src/clientlib/env.h.in
+++ b/contrib/native/client/src/clientlib/env.h.in
@@ -20,7 +20,7 @@
 #define ENV_H
 
 #define DRILL_NAME  "Apache Drill"
-#define DRILL_CONNECTOR_NAME"Apache Drill C++ client"
+#define DRILL_CONNECTOR_NAME"Apache Drill C++ Client"
 #define DRILL_VERSION_STRING"@PROJECT_VERSION@"
 
 #define DRILL_VERSION_MAJOR @PROJECT_VERSION_MAJOR@
@@ -30,6 +30,11 @@
 #define GIT_SHA_PROP  @GIT_SHA_PROP@
 #define GIT_COMMIT_PROP @GIT_COMMIT_PROP@
 
+#define DRILL_LEGALCOPYRIGHT_STR"Copyright (c) 2013-2017 The Apache 
Software Foundation\0"
+#define DRILL_PRODUCTNAME_STR   DRILL_CONNECTOR_NAME "\0"
+#define DRILL_PRODUCTVERSION_STRDRILL_VERSION_STRING ".0\0"
+#define DRILL_INTERNALNAME_STR  "drillClient.dll\0"
+
 #endif
 
 
diff --git a/contrib/native/client/src/clientlib/version.rc.in 
b/contrib/native/client/src/clientlib/version.rc.in
new file mode 100644
index 00..c013261cc6
--- /dev/null
+++ b/contrib/native/client/src/clientlib/version.rc.in
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+ #include "env.h"
+ 
+#define APSTUDIO_READONLY_SYMBOLS
+/
+//
+// Generated resource.
+//
+#include "afxres.h"
+
+/
+#undef APSTUDIO_READONLY_SYMBOLS
+
+/
+// English (US) resources.
+LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
+#pragma code_page(1252)
+
+VS_VERSION_INFO VERSIONINFO
+ FILEVERSION 
@PROJECT_VERSION_MAJOR@,@PROJECT_VERSION_MINOR@,@PROJECT_VERSION_PATCH@,0
+ PRODUCTVERSION 
@PROJECT_VERSION_MAJOR@,@PROJECT_VERSION_MINOR@,@PROJECT_VERSION_PATCH@,0
+ FILEFLAGSMASK 0x3fL 
+#ifdef _DEBUG
+ FILEFLAGS 0x1L
+#else
+ FILEFLAGS 0x0L
+#endif
+ FILEOS 0x4L
+ FILETYPE 0x0L
+ FILESUBTYPE 0x0L
+BEGIN
+BLOCK "StringFileInfo"
+BEGIN
+BLOCK "040904b0"
+BEGIN
+VALUE "CompanyName", "\0"
+VALUE "FileDescription", DRILL_PRODUCTNAME_STR
+VALUE "FileVersion", DRILL_PRODUCTVERSION_STR
+VALUE "LegalCopyright", DRILL_LEGALCOPYRIGHT_STR
+VALUE "ProductName", DRILL_PRODUCTNAME_STR
+VALUE "ProductVersion", 

[jira] [Commented] (DRILL-6457) Sqlline - infer Kerberos principal dynamically to be able to use individual keytabs across Drill nodes and still use ZooKeeper connection string for High Availability

2018-06-04 Thread Arjun (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500972#comment-16500972
 ] 

Arjun commented on DRILL-6457:
--

[~harisekhon] Can you try with service_name=mapr instead of providing principal 
as given below ?
sqlline -u 
"jdbc:drill:zk=host1:5181,host2:5181,host3:5181;auth=kerberos;service_name=mapr"

> Sqlline - infer Kerberos principal dynamically to be able to use individual 
> keytabs across Drill nodes and still use ZooKeeper connection string for High 
> Availability
> --
>
> Key: DRILL-6457
> URL: https://issues.apache.org/jira/browse/DRILL-6457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI, Client - JDBC, Security, Tools, Build 
>  Test
>Affects Versions: 1.13.0
> Environment: MapR 6
>Reporter: Hari Sekhon
>Priority: Major
>
> Sqlline requires explicit kerberos 'principal=' parameter in its JDBC 
> connection string, eg: 
> {code:java}
> zk=;auth=kerberos;principal=mapr/@REALM{code}
> When Drill nodes are configured with individual keytabs containing the node's 
> fqdn and configured like so:
> {code:java}
> security: { auth.principal: mapr/_HOST@REALM }{code}
> then the ZooKeeper connection string from sqlline does not work and results 
> in GSS Kerberos error:
> {code:java}
> Caused by: KrbException: Identifier doesn't match expected value{code}
> due to the mismatch between the explicit sqlline kerberos principal and 
> zookeeper's random drillbit's principal.
> For the connection to work in this case requires something more like:
> {code:java}
> drillbits=$(hostname -f);auth=kerberos;principal=mapr/$(hostname 
> -f)@REALM{code}
> but this lacks the high availability of using the ZooKeeper connection string 
> to connect to any available node
> Hence it would be good if there was a way for sqlline arguments to be able to 
> either infer the correct kerberos principal to match the host that zookeeper 
> tells it to connect to or else accept a more generic parameter such as:
> {code:java}
> zk=;auth=kerberos;principal=mapr/_HOST@REALM{code}
> I've tested the above but it doesn't work showing that sqlline is not using a 
> dynamic kerberos principal to match the host it is connecting to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5700) nohup support for sqlline

2018-06-04 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-5700:
---

Assignee: (was: Kunal Khatua)

> nohup support for sqlline 
> --
>
> Key: DRILL-5700
> URL: https://issues.apache.org/jira/browse/DRILL-5700
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: Arjun
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Sqlline script does support nohup mode for execution. On execution, it 
> remains stopped until it is made fore ground.
> {code:java}
> [mapr@node1 ~]$ cat test.sql
> select * from sys.drillbits
> [mapr@node1 ~]$
> [mapr@node1 ~]$ nohup sqlline -u "jdbc:drill:" -n mapr -p mapr -f test.sql  &
> [1] 24019
> [mapr@node1 ~]$ nohup: ignoring input and appending output to `nohup.out'
> [1]+  Stopped nohup sqlline -u "jdbc:drill:zk=node1:5181" -n 
> mapr -p mapr -f test.sql
> [mapr@node1 ~]$
> [mapr@node1 ~]$ fg
> nohup sqlline -u "jdbc:drill:zk=node1:5181" -n mapr -p mapr -f test.sql
> [mapr@node1 ~]$
> [mapr@node1 ~]$ cat nohup.out
> 0: jdbc:drill:zk=node1:5181> Closing: 
> org.apache.drill.jdbc.impl.DrillConnectionImpl
> output of ps: S
> 1/1  select * from sys.drillbits;
> +--++---++--+--+
> | hostname | user_port  | control_port  | data_port  | current  | 
> version  |
> +--++---++--+--+
> | node1  | 31010  | 31011 | 31012  | true | 1.10.0   |
> +--++---++--+--+
> 1 row selected (0.354 seconds)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> apache drill 1.10.0
> "drill baby drill"
> [mapr@node1 ~]$
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4276) Need a way to check on status of drillbits

2018-06-04 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua resolved DRILL-4276.
-
   Resolution: Resolved
Fix Version/s: 1.14.0

Resolved by DRILL-6289

> Need a way to check on status of drillbits
> --
>
> Key: DRILL-4276
> URL: https://issues.apache.org/jira/browse/DRILL-4276
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Monitoring
>Reporter: Victoria Markman
>Priority: Major
> Fix For: 1.14.0
>
>
> So I had this situation when cluster started with 8 nodes and 2 went down for 
> some reason. 
> As a user, my only way to detect this situation:
> * query failed because something started to execute on a node and failed 
> because it went down (and for that I have to comb through the logs to find a 
> warning)
> * my queries are extremely slow, because my queries started to execute after 
> node went down and got deregistered from zookeeper.
> * somebody just stopped drillbit on a particular node
> Since there is no central place (apart from zookeeper) where information on 
> participating nodes is kept, when I queried sys.drillbits, I got 6 nodes, as 
> if 2 others never existed ...There is beauty in flexibilty, but in real life 
> situation when you have more than 20 nodes, things can get out control 
> quickly.
> Since zookeeper has this information in the first place, can we enhance 
> sys.drillbits table to have drillbit status as zookeeper sees it ?
> This can also help with testing and automating test cases that test for 
> failure conditions like that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500709#comment-16500709
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options 
webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-394458315
 
 
   The choice of providing the description via JScript is primarily to reduce 
the load on the web server. The description is populated in the table by the 
client browser, because, the alternative is to have a single web-server thread 
inject the description for every field every time the `/options` page is 
accessed.
   IMHO, the solution for DRILL-4699 (description of Sys.Options) and 
DRILL-3988 (description of Sys.functions) would be similar. At the moment, i 
don't have that solution as I primarily wanted the `/options` page to be 
user-friendly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6438) Remove excess logging from tests

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500707#comment-16500707
 ] 

ASF GitHub Bot commented on DRILL-6438:
---

ilooner commented on issue #1284: DRILL-6438: Remove excess logging form tests.
URL: https://github.com/apache/drill/pull/1284#issuecomment-394458172
 
 
   @arina-ielchiieva I made this Jira to track banning those functions. 
https://issues.apache.org/jira/browse/DRILL-6464
   
   @vvysotskyi I converted the print statements to log statements in VectorUtil


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove excess logging from tests
> 
>
> Key: DRILL-6438
> URL: https://issues.apache.org/jira/browse/DRILL-6438
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> TestLocalExchange and TestLoad have this issue.
> See example
> {code}
> Running 
> org.apache.drill.exec.physical.impl.TestLocalExchange#testGroupByMultiFields
> Plan: {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.width.max_per_node",
>   "num_val" : 2,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_mux_exchange",
>   "bool_val" : true,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_demux_exchange",
>   "bool_val" : false,
>   "scope" : "SESSION"
> }, {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.slice_target",
>   "num_val" : 1,
>   "scope" : "SESSION"
> } ],
> "queue" : 0,
> "hasResourcePlan" : false,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 196611,
> "userName" : "travis",
> "files" : [ 
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/6.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/9.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/3.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/1.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/2.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/7.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/0.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/5.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/4.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/8.json"
>  ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "tmp" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/dfsTestTmp/1527026062606-0",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "default" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   

[jira] [Commented] (DRILL-6389) Fix building javadocs

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500711#comment-16500711
 ] 

ASF GitHub Bot commented on DRILL-6389:
---

ilooner commented on issue #1276: DRILL-6389: Generate Javadoc for all classes 
and fix some warnings.
URL: https://github.com/apache/drill/pull/1276#issuecomment-394458592
 
 
   @arina-ielchiieva Could you take another look at this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix building javadocs
> -
>
> Key: DRILL-6389
> URL: https://issues.apache.org/jira/browse/DRILL-6389
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> Javadocs don't build when running
> {code}
> mvn javadoc:aggregate
> {code}
> Get the javadocs for all the classes to build. Fix some warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-5984) Support for Symlinked Table Paths to be used in Drill Queries.

2018-06-04 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-5984:
---

Assignee: Pritesh Maker

> Support for Symlinked Table Paths to be used in Drill Queries.
> --
>
> Key: DRILL-5984
> URL: https://issues.apache.org/jira/browse/DRILL-5984
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.11.0
> Environment: OS : CentOS 7.1
> MapR-DB Version: 5.2.2
>Reporter: Saravanabavagugan Vengadasundaram
>Assignee: Pritesh Maker
>Priority: Major
>
> MapR-FS supports symlinks and hence MapR-DB table paths support symlinks as 
> well. As part of the project I work on, we use symlinks as a means of 
> communication to talk to the physical file. An employee table in MapR-DB will 
> be represented as  "/tables/Employee/Entity_1233232" and there will be a 
> symlink called "/tables/Employee/Entity" pointing to the actual physical 
> table. Currently, drill does not understand queries having the symlink path 
> but only executes queries having the actual physical table path. So every 
> time, I need to find out the actual physical path of the table and frame my 
> query. It would be nice to have this feature in next version of Drill. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5984) Support for Symlinked Table Paths to be used in Drill Queries.

2018-06-04 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500700#comment-16500700
 ] 

Kunal Khatua commented on DRILL-5984:
-

[~vsbgugan]

I tested this with a sym link to a CSV data on MapR-FS, and a query against the 
sym-linked file only got back the headers. However, querying the same symlink 
via the NFS mount yields the correct result. I suspect that the Hadoop FS APIs 
might be causing this. 

I know that a standalone MapR-DB table can be queried via a symlink, but that 
was done using a Java application.

What is the error you get when you attempt to query with Drill ?

> Support for Symlinked Table Paths to be used in Drill Queries.
> --
>
> Key: DRILL-5984
> URL: https://issues.apache.org/jira/browse/DRILL-5984
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.11.0
> Environment: OS : CentOS 7.1
> MapR-DB Version: 5.2.2
>Reporter: Saravanabavagugan Vengadasundaram
>Priority: Major
>
> MapR-FS supports symlinks and hence MapR-DB table paths support symlinks as 
> well. As part of the project I work on, we use symlinks as a means of 
> communication to talk to the physical file. An employee table in MapR-DB will 
> be represented as  "/tables/Employee/Entity_1233232" and there will be a 
> symlink called "/tables/Employee/Entity" pointing to the actual physical 
> table. Currently, drill does not understand queries having the symlink path 
> but only executes queries having the actual physical table path. So every 
> time, I need to find out the actual physical path of the table and frame my 
> query. It would be nice to have this feature in next version of Drill. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6464) Disallow System.out, System.err, and Exception.printStackTrace

2018-06-04 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6464:

Issue Type: Task  (was: Bug)

> Disallow System.out, System.err, and Exception.printStackTrace
> --
>
> Key: DRILL-6464
> URL: https://issues.apache.org/jira/browse/DRILL-6464
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Timothy Farkas
>Priority: Major
>
> Add checkstyle rules to disallow using these print methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6464) Disallow System.out, System.err, and Exception.printStackTrace

2018-06-04 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6464:
-

 Summary: Disallow System.out, System.err, and 
Exception.printStackTrace
 Key: DRILL-6464
 URL: https://issues.apache.org/jira/browse/DRILL-6464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas


Add checkstyle rules to disallow using these print methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500594#comment-16500594
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

arina-ielchiieva commented on issue #1279: DRILL-5735: Allow search/sort in the 
Options webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-394438495
 
 
   @kkhatua though the idea of the improvement is nice but having options 
description in js file is not a good, looks like a hack. :) We should find a 
way to add description in drill-module.conf file or other approach you can 
think of. Thus description will be loaded along with the options defaults at 
start up and thus it would be easily added to the options page, as well as in 
sys.options table as part of description column,


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500563#comment-16500563
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

JohnOmernik commented on issue #1279: DRILL-5735: Allow search/sort in the 
Options webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-394434022
 
 
   So on the idea of UDFs, I don't think UDFs that are NOT shipped with the
   product should be a concern of ours.  Any UDF that ships with Drill needs
   to be in the sys.functions.
   
   Perhaps it would be cool to have a way to register UDFs in sys.function so
   that folks who have UDFs that want to use  sys.functions as a documentation
   store can.  Basically, a way in the UDF to add "company only" UDFs to the
   sys.functions?
   
   John
   
   On Mon, May 21, 2018 at 2:04 PM, Kunal Khatua 
   wrote:
   
   > @JohnOmernik  I agree. Single-source of
   > truth for descriptions will help a long way.
   > A similar ask for sys.functions is something that @arina-ielchiieva
   >  had pointed to when we introduced
   > syntax highlighting. That, of course, is more trickier due to dynamic UDFs.
   > We can try to address that separately as a combined commit. For now, does
   > this look sufficient within the context of the JIRA ?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5735) UI options grouping and filtering & Metrics hints

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500557#comment-16500557
 ] 

ASF GitHub Bot commented on DRILL-5735:
---

dvjyothsna commented on issue #1279: DRILL-5735: Allow search/sort in the 
Options webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-394432923
 
 
   Very good enhancement Kunal. LGTM +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UI options grouping and filtering & Metrics hints
> -
>
> Key: DRILL-5735
> URL: https://issues.apache.org/jira/browse/DRILL-5735
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: Muhammad Gelbana
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> I'm thinking of some UI improvements that could make all the difference for 
> users trying to optimize low-performing queries.
> h2. Options
> h3. Grouping
> We can organize the options to be grouped by their scope of effect, this will 
> help users easily locate the options they may need to tune.
> h3. Filtering
> Since the options are a lot, we can add a filtering mechanism (i.e. string 
> search or group\scope filtering) so the user can filter out the options he's 
> not interested in. To provide more benefit than the grouping idea mentioned 
> above, filtering may include keywords also and not just the option name, 
> since the user may not be aware of the name of the option he's looking for.
> h2. Metrics
> I'm referring here to the metrics page and the query execution plan page that 
> displays the overview section and major\minor fragments metrics. We can show 
> hints for each metric such as:
> # What does it represent in more details.
> # What option\scope-of-options to tune (increase ? decrease ?) to improve the 
> performance reported by this metric.
> # May be even provide a small dialog to quickly allow the modification of the 
> related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6459) Unable to view profile of a running query

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500547#comment-16500547
 ] 

ASF GitHub Bot commented on DRILL-6459:
---

vrozov commented on a change in pull request #1301: DRILL-6459: Unable to view 
profile of a running query
URL: https://github.com/apache/drill/pull/1301#discussion_r192817753
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/proto/helper/QueryIdHelper.java
 ##
 @@ -33,7 +33,7 @@ public static String getQueryId(final QueryId queryId) {
 
   public static QueryId getQueryIdFromString(final String queryId) {
 final UUID uuid = UUID.fromString(queryId);
-return 
QueryId.newBuilder().setPart1(uuid.getMostSignificantBits()).setPart2(uuid.getLeastSignificantBits()).build();
+return 
QueryId.newBuilder().setPart1(uuid.getMostSignificantBits()).setPart2(uuid.getLeastSignificantBits()).setText(queryId).build();
 
 Review comment:
   @kkhatua I don't see why was it necessary to introduce text id field into 
the query ID. How for query ID object text field may become different from 
part1 and part2? This field at best belongs to query profile, not query id.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unable to view profile of a running query
> -
>
> Key: DRILL-6459
> URL: https://issues.apache.org/jira/browse/DRILL-6459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> When running a query on the current master (), the query lists in the 
> _Running Queries_ table. But when trying to view the profile, the following 
> error appears:
> {code:java}
> {
>   "errorMessage" : "VALIDATION ERROR: No profile with given query id 
> '24ee72cd-893d-e359-4811-ad79905410a1' exists. Please verify the query 
> id.\n\n\n[Error Id: 59ef7486-889e-4bc9-a96a-b47c3421cfaf ]"
> }
> {code}
> I suspect this might have to do with version-related changes to the profile's 
> LocalPersistentStore or changes to the registering of running queries.
>  
> The query, however, is eventually available on completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500515#comment-16500515
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810692
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java
 ##
 @@ -454,6 +456,30 @@ public void testMultiRowGroup() throws Exception {
 PlanTestBase.testPlanMatchingPatterns(sql, expectedPlan);
   }
 
+  @Test
+  public void testFilterPruning() throws Exception {
+// multirowgroup2 is a parquet file with 3 rowgroups inside. One with a=0, 
another with a=1 and a=2, and the last with a=3;
+// FilterPushDown should be able to prune the filter from the scan 
operator according to the rowgroup statistics.
+final String sql = "select * from dfs.`parquet/multirowgroup2.parquet` 
where ";
+PlanTestBase.testPlanMatchingPatterns(sql + "a > 1", new 
String[]{"numRowGroups=2"}, new String[]{}); //No filter pruning
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500517#comment-16500517
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810720
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   UNAPPLICABLE : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, UNAPPLICABLE}
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500518#comment-16500518
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810866
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   UNAPPLICABLE : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, UNAPPLICABLE}
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500514#comment-16500514
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810649
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/RowGroupInfo.java
 ##
 @@ -91,8 +93,9 @@ public long getRowCount() {
 return columns;
   }
 
-  public void setColumns(List columns) {
-this.columns = columns;
-  }
+  public void setColumns(List columns) { 
this.columns = columns; }
+
+  public ParquetFilterPredicate.ROWS_MATCH getRowValid() { return rowValid; }
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500519#comment-16500519
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810866
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   UNAPPLICABLE : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, UNAPPLICABLE}
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500512#comment-16500512
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810573
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 ##
 @@ -228,29 +189,15 @@ public LEPredicate(LogicalExpression left, 
LogicalExpression right) {
 }
 
 @Override
-public boolean canDrop(RangeExprEvaluator evaluator) {
-  Statistics leftStat = left.accept(evaluator, null);
-  Statistics rightStat = right.accept(evaluator, null);
-
-  if (leftStat == null ||
-  rightStat == null ||
-  leftStat.isEmpty() ||
-  rightStat.isEmpty()) {
-return false;
-  }
-
-  // if either side is ALL null, = is evaluated to UNKNOW -> canDrop
-  if (ParquetPredicatesHelper.isAllNulls(leftStat, 
evaluator.getRowCount()) ||
-  ParquetPredicatesHelper.isAllNulls(rightStat, 
evaluator.getRowCount())) {
-return true;
-  }
-
+protected ROWS_MATCH matchesCond() {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500513#comment-16500513
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810611
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   MISLEAD : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, MISLEAD}
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500510#comment-16500510
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810485
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetBooleanPredicates.java
 ##
 @@ -44,15 +44,31 @@ public AndPredicate(String name, List 
args, ExpressionPositio
   super(name, args, pos);
 }
 
+/**
+ * Evaluates a compound "AND" filter on the statistics of a RowGroup (the 
filter reads "filterA and filterB").
+ * Return value :
+ *   ALL : only if all filters return ALL
+ *   NONE : if one filter at least returns NONE
+ *   MISLEAD : all other cases
+ * 
+ */
 @Override
-public boolean canDrop(RangeExprEvaluator evaluator) {
-  // "and" : as long as one branch is OK to drop, we can drop it.
+public ROWS_MATCH matches(RangeExprEvaluator evaluator) {
+  ROWS_MATCH m, temp = ROWS_MATCH.NONE;
   for (LogicalExpression child : this) {
-if (child instanceof ParquetFilterPredicate && 
((ParquetFilterPredicate) child).canDrop(evaluator)) {
-  return true;
+m = ((ParquetFilterPredicate) child).matches(evaluator);
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500511#comment-16500511
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810524
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 ##
 @@ -30,10 +30,13 @@
  * Comparison predicates for parquet filter pushdown.
  */
 public class ParquetComparisonPredicates {
-  public static abstract  class ParquetCompPredicate extends 
LogicalExpressionBase implements ParquetFilterPredicate {
+  public static abstract class ParquetCompPredicate extends 
LogicalExpressionBase implements ParquetFilterPredicate {
 protected final LogicalExpression left;
+
 protected final LogicalExpression right;
 
+protected Statistics leftStat, rightStat;
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500509#comment-16500509
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192810441
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -231,43 +231,39 @@ public GroupScan applyFilter(LogicalExpression 
filterExpr, UdfUtilities udfUtili
 ParquetFilterPredicate filterPredicate = null;
 
 for (RowGroupInfo rowGroup : rowGroupInfos) {
-  final ColumnExplorer columnExplorer = new ColumnExplorer(optionManager, 
columns);
-  List partitionValues = getPartitionValues(rowGroup);
-  Map implicitColValues = 
columnExplorer.populateImplicitColumns(rowGroup.getPath(), partitionValues, 
supportsFileImplicitColumns());
-
-  ParquetMetaStatCollector statCollector = new ParquetMetaStatCollector(
-  parquetTableMetadata,
-  rowGroup.getColumns(),
-  implicitColValues);
-
-  Map columnStatisticsMap = 
statCollector.collectColStat(schemaPathsInExpr);
-
-  if (filterPredicate == null) {
-ErrorCollector errorCollector = new ErrorCollectorImpl();
-LogicalExpression materializedFilter = 
ExpressionTreeMaterializer.materializeFilterExpr(
-filterExpr, columnStatisticsMap, errorCollector, 
functionImplementationRegistry);
-
-if (errorCollector.hasErrors()) {
-  logger.error("{} error(s) encountered when materialize filter 
expression : {}",
-  errorCollector.getErrorCount(), errorCollector.toErrorString());
-  return null;
-}
-logger.debug("materializedFilter : {}", 
ExpressionStringBuilder.toString(materializedFilter));
+final ColumnExplorer columnExplorer = new 
ColumnExplorer(optionManager, columns);
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6438) Remove excess logging from tests

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500499#comment-16500499
 ] 

ASF GitHub Bot commented on DRILL-6438:
---

vvysotskyi commented on issue #1284: DRILL-6438: Remove excess logging form 
tests.
URL: https://github.com/apache/drill/pull/1284#issuecomment-394422831
 
 
   Thanks for so many changes!
   The output for some tests (`TestSimpleProjection`, `TestConvertFunctions` 
etc.) is printed using methods from `VectorUtil`. It would be fine to change 
the output for these methods.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove excess logging from tests
> 
>
> Key: DRILL-6438
> URL: https://issues.apache.org/jira/browse/DRILL-6438
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> TestLocalExchange and TestLoad have this issue.
> See example
> {code}
> Running 
> org.apache.drill.exec.physical.impl.TestLocalExchange#testGroupByMultiFields
> Plan: {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.width.max_per_node",
>   "num_val" : 2,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_mux_exchange",
>   "bool_val" : true,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_demux_exchange",
>   "bool_val" : false,
>   "scope" : "SESSION"
> }, {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.slice_target",
>   "num_val" : 1,
>   "scope" : "SESSION"
> } ],
> "queue" : 0,
> "hasResourcePlan" : false,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 196611,
> "userName" : "travis",
> "files" : [ 
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/6.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/9.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/3.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/1.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/2.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/7.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/0.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/5.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/4.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/8.json"
>  ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "tmp" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/dfsTestTmp/1527026062606-0",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "default" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500456#comment-16500456
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192801581
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   UNAPPLICABLE : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, UNAPPLICABLE}
 
 Review comment:
   inapplicable :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500268#comment-16500268
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192748717
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 ##
 @@ -228,29 +189,15 @@ public LEPredicate(LogicalExpression left, 
LogicalExpression right) {
 }
 
 @Override
-public boolean canDrop(RangeExprEvaluator evaluator) {
-  Statistics leftStat = left.accept(evaluator, null);
-  Statistics rightStat = right.accept(evaluator, null);
-
-  if (leftStat == null ||
-  rightStat == null ||
-  leftStat.isEmpty() ||
-  rightStat.isEmpty()) {
-return false;
-  }
-
-  // if either side is ALL null, = is evaluated to UNKNOW -> canDrop
-  if (ParquetPredicatesHelper.isAllNulls(leftStat, 
evaluator.getRowCount()) ||
-  ParquetPredicatesHelper.isAllNulls(rightStat, 
evaluator.getRowCount())) {
-return true;
-  }
-
+protected ROWS_MATCH matchesCond() {
 
 Review comment:
   Please use full name for condition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500265#comment-16500265
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192746227
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetBooleanPredicates.java
 ##
 @@ -44,15 +44,31 @@ public AndPredicate(String name, List 
args, ExpressionPositio
   super(name, args, pos);
 }
 
+/**
+ * Evaluates a compound "AND" filter on the statistics of a RowGroup (the 
filter reads "filterA and filterB").
+ * Return value :
+ *   ALL : only if all filters return ALL
+ *   NONE : if one filter at least returns NONE
+ *   MISLEAD : all other cases
+ * 
+ */
 @Override
-public boolean canDrop(RangeExprEvaluator evaluator) {
-  // "and" : as long as one branch is OK to drop, we can drop it.
+public ROWS_MATCH matches(RangeExprEvaluator evaluator) {
+  ROWS_MATCH m, temp = ROWS_MATCH.NONE;
   for (LogicalExpression child : this) {
-if (child instanceof ParquetFilterPredicate && 
((ParquetFilterPredicate) child).canDrop(evaluator)) {
-  return true;
+m = ((ParquetFilterPredicate) child).matches(evaluator);
 
 Review comment:
   1. It would be better if variables `m`, `temp` would have meaning names, 
please consider the same in other places.
   2. I believe it's better to leaving `instanceof` check.
   3. Please consider re-writing below if statements to be more simple.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500262#comment-16500262
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192748428
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 ##
 @@ -30,10 +30,13 @@
  * Comparison predicates for parquet filter pushdown.
  */
 public class ParquetComparisonPredicates {
-  public static abstract  class ParquetCompPredicate extends 
LogicalExpressionBase implements ParquetFilterPredicate {
+  public static abstract class ParquetCompPredicate extends 
LogicalExpressionBase implements ParquetFilterPredicate {
 protected final LogicalExpression left;
+
 protected final LogicalExpression right;
 
+protected Statistics leftStat, rightStat;
 
 Review comment:
   Looks like in Drill we prefer to introduce each variable from the new line. 
Please consider applying the same in your code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500263#comment-16500263
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192745741
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -231,43 +231,39 @@ public GroupScan applyFilter(LogicalExpression 
filterExpr, UdfUtilities udfUtili
 ParquetFilterPredicate filterPredicate = null;
 
 for (RowGroupInfo rowGroup : rowGroupInfos) {
-  final ColumnExplorer columnExplorer = new ColumnExplorer(optionManager, 
columns);
-  List partitionValues = getPartitionValues(rowGroup);
-  Map implicitColValues = 
columnExplorer.populateImplicitColumns(rowGroup.getPath(), partitionValues, 
supportsFileImplicitColumns());
-
-  ParquetMetaStatCollector statCollector = new ParquetMetaStatCollector(
-  parquetTableMetadata,
-  rowGroup.getColumns(),
-  implicitColValues);
-
-  Map columnStatisticsMap = 
statCollector.collectColStat(schemaPathsInExpr);
-
-  if (filterPredicate == null) {
-ErrorCollector errorCollector = new ErrorCollectorImpl();
-LogicalExpression materializedFilter = 
ExpressionTreeMaterializer.materializeFilterExpr(
-filterExpr, columnStatisticsMap, errorCollector, 
functionImplementationRegistry);
-
-if (errorCollector.hasErrors()) {
-  logger.error("{} error(s) encountered when materialize filter 
expression : {}",
-  errorCollector.getErrorCount(), errorCollector.toErrorString());
-  return null;
-}
-logger.debug("materializedFilter : {}", 
ExpressionStringBuilder.toString(materializedFilter));
+final ColumnExplorer columnExplorer = new 
ColumnExplorer(optionManager, columns);
 
 Review comment:
   Please revert change indention change. Should be 2.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500264#comment-16500264
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192748956
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetFilterPredicate.java
 ##
 @@ -18,5 +18,16 @@
 package org.apache.drill.exec.expr.stat;
 
 public interface ParquetFilterPredicate {
-  boolean canDrop(RangeExprEvaluator evaluator);
+  /**
+   * Define the validity of a row group against a filter
+   * 
+   *   ALL : all rows match the filter (canDrop the row group = false and 
filter pruning = true)
+   *   NONE : no row matches the filter (canDrop the row group = true)
+   *   SOME : some rows only match the filter (canDrop the row group = 
false and filter pruning = false)
+   *   MISLEAD : filter can not be applied
+   * 
+   */
+  enum ROWS_MATCH {ALL, NONE, SOME, MISLEAD}
 
 Review comment:
   Consider renaming mislead to not applicable or something similar.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500261#comment-16500261
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192749799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/RowGroupInfo.java
 ##
 @@ -91,8 +93,9 @@ public long getRowCount() {
 return columns;
   }
 
-  public void setColumns(List columns) {
-this.columns = columns;
-  }
+  public void setColumns(List columns) { 
this.columns = columns; }
+
+  public ParquetFilterPredicate.ROWS_MATCH getRowValid() { return rowValid; }
 
 Review comment:
   Please consider renaming the setter and setter method to more obvious.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500266#comment-16500266
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192750997
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java
 ##
 @@ -165,13 +166,26 @@ protected void doOnMatch(RelOptRuleCall call, FilterPrel 
filter, ProjectPrel pro
   return;
 }
 
-
-RelNode newScan = ScanPrel.create(scan, scan.getTraitSet(), newGroupScan, 
scan.getRowType());;
+RelNode newScan = ScanPrel.create(scan, scan.getTraitSet(), newGroupScan, 
scan.getRowType());
 
 if (project != null) {
   newScan = project.copy(project.getTraitSet(), ImmutableList.of(newScan));
 }
-final RelNode newFilter = filter.copy(filter.getTraitSet(), 
ImmutableList.of(newScan));
-call.transformTo(newFilter);
+
+List rowGroupInfos = 
((AbstractParquetGroupScan)newGroupScan).rowGroupInfos;
 
 Review comment:
   1. Please add space after `(AbstractParquetGroupScan)`.
   2. Please try to get rid of explicit casting.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500267#comment-16500267
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192750736
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java
 ##
 @@ -454,6 +456,30 @@ public void testMultiRowGroup() throws Exception {
 PlanTestBase.testPlanMatchingPatterns(sql, expectedPlan);
   }
 
+  @Test
+  public void testFilterPruning() throws Exception {
+// multirowgroup2 is a parquet file with 3 rowgroups inside. One with a=0, 
another with a=1 and a=2, and the last with a=3;
+// FilterPushDown should be able to prune the filter from the scan 
operator according to the rowgroup statistics.
+final String sql = "select * from dfs.`parquet/multirowgroup2.parquet` 
where ";
+PlanTestBase.testPlanMatchingPatterns(sql + "a > 1", new 
String[]{"numRowGroups=2"}, new String[]{}); //No filter pruning
 
 Review comment:
   You can pass `null` instead of `new String[]{}` or consider using overloaded 
method without excluded pattern.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500202#comment-16500202
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192737154
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetFilterPushDownForComplexTypes.java
 ##
 @@ -80,9 +80,9 @@ public static void copyData() {
   public void testPushDownArray() throws Exception {
 testParquetFilterPushDown("t.`user`.hobby_ids[0] = 1", 3, 2);
 testParquetFilterPushDown("t.`user`.hobby_ids[0] = 100", 0, 1);
-testParquetFilterPushDown("t.`user`.hobby_ids[0] <> 1", 8, 6);
-testParquetFilterPushDown("t.`user`.hobby_ids[2] > 20", 5, 3);
-testParquetFilterPushDown("t.`user`.hobby_ids[0] between 10 and 20", 5, 4);
+testParquetFilterPushDown("t.`user`.hobby_ids[0] <> 1", 8, 7);
 
 Review comment:
   Hello Arina,
   after last refactoring, the output are not changed anymore. Thank you for 
noticing.
   best regards


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500195#comment-16500195
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192736578
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 ##
 @@ -30,8 +30,9 @@
  * Comparison predicates for parquet filter pushdown.
  */
 public class ParquetComparisonPredicates {
 
 Review comment:
   Hello VRozov,
   you're right. I though less code modification was better (my preceding team 
was afraid of code refactoring...). I have refactored the class.
   best regards


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem

2018-06-04 Thread Charles Givre (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500122#comment-16500122
 ] 

Charles Givre commented on DRILL-4650:
--

Hi [~kkhatua], A long time ago, I wrote this format plugin: 
[https://github.com/cgivre/drill-excel-plugin|https://github.com/cgivre/drill-excel-plugin.]
 which can read Excel files.  It uses Apache POI, so theoretically it could 
also be used for Open Office spreadsheets as well.  In any event, was this a 
hint to submit this as a PR?  If so, I'm happy to do that.  I'm currently 
working on a PR for a regex reader for Drill (DRILL-6104) and once that is 
committed, I'll get this ready to go.

>  Excel file (.xsl) and Microsoft Access file (.accdb) problem
> -
>
> Key: DRILL-4650
> URL: https://issues.apache.org/jira/browse/DRILL-4650
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.6.0
>Reporter: Sanjiv Kumar
>Assignee: Charles Givre
>Priority: Major
>
> I am trying to query from excel file(.xsl file) and ms access file (.accdb), 
> but i am unable to query from these files in drill. Is there any way to query 
> these files. Or any Storage Plugin for query these excel and ms access files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6438) Remove excess logging from tests

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500058#comment-16500058
 ] 

ASF GitHub Bot commented on DRILL-6438:
---

arina-ielchiieva commented on issue #1284: DRILL-6438: Remove excess logging 
form tests.
URL: https://github.com/apache/drill/pull/1284#issuecomment-394314429
 
 
   @ilooner what about banning `System.out` and `e.printStackTrace()`? Will 
this be addressed in different PR? If yes, could you please point out to the 
Jira?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove excess logging from tests
> 
>
> Key: DRILL-6438
> URL: https://issues.apache.org/jira/browse/DRILL-6438
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> TestLocalExchange and TestLoad have this issue.
> See example
> {code}
> Running 
> org.apache.drill.exec.physical.impl.TestLocalExchange#testGroupByMultiFields
> Plan: {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.width.max_per_node",
>   "num_val" : 2,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_mux_exchange",
>   "bool_val" : true,
>   "scope" : "SESSION"
> }, {
>   "kind" : "BOOLEAN",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.enable_demux_exchange",
>   "bool_val" : false,
>   "scope" : "SESSION"
> }, {
>   "kind" : "LONG",
>   "accessibleScopes" : "ALL",
>   "name" : "planner.slice_target",
>   "num_val" : 1,
>   "scope" : "SESSION"
> } ],
> "queue" : 0,
> "hasResourcePlan" : false,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 196611,
> "userName" : "travis",
> "files" : [ 
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/6.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/9.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/3.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/1.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/2.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/7.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/0.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/5.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/4.json",
>  
> "file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/8.json"
>  ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "workspaces" : {
> "root" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "tmp" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/dfsTestTmp/1527026062606-0",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   "allowAccessOutsideWorkspace" : false
> },
> "default" : {
>   "location" : 
> "/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
>   "writable" : true,
>   "defaultInputFormat" : null,
>   

[jira] [Updated] (DRILL-6432) Allow to print the visualized query plan only

2018-06-04 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6432:

Labels: ready-to-commit  (was: )

> Allow to print the visualized query plan only
> -
>
> Key: DRILL-6432
> URL: https://issues.apache.org/jira/browse/DRILL-6432
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Provide a convenient way to printing the Visual Query Plan only, instead of 
> the entire profile page.
> This allows for capability in specifying the zoom level when printing large 
> complex plans that might span multiple pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6432) Allow to print the visualized query plan only

2018-06-04 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6432:

Reviewer: Arina Ielchiieva  (was: Sorabh Hamirwasia)

> Allow to print the visualized query plan only
> -
>
> Key: DRILL-6432
> URL: https://issues.apache.org/jira/browse/DRILL-6432
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Provide a convenient way to printing the Visual Query Plan only, instead of 
> the entire profile page.
> This allows for capability in specifying the zoom level when printing large 
> complex plans that might span multiple pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6432) Allow to print the visualized query plan only

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500047#comment-16500047
 ] 

ASF GitHub Bot commented on DRILL-6432:
---

arina-ielchiieva commented on issue #1278: DRILL-6432: Show Button to print 
visualized query plan
URL: https://github.com/apache/drill/pull/1278#issuecomment-394313834
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow to print the visualized query plan only
> -
>
> Key: DRILL-6432
> URL: https://issues.apache.org/jira/browse/DRILL-6432
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.14.0
>
>
> Provide a convenient way to printing the Visual Query Plan only, instead of 
> the entire profile page.
> This allows for capability in specifying the zoom level when printing large 
> complex plans that might span multiple pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-06-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500031#comment-16500031
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

arina-ielchiieva commented on a change in pull request #1298: DRILL-5796: 
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r192698496
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetFilterPushDownForComplexTypes.java
 ##
 @@ -80,9 +80,9 @@ public static void copyData() {
   public void testPushDownArray() throws Exception {
 testParquetFilterPushDown("t.`user`.hobby_ids[0] = 1", 3, 2);
 testParquetFilterPushDown("t.`user`.hobby_ids[0] = 100", 0, 1);
-testParquetFilterPushDown("t.`user`.hobby_ids[0] <> 1", 8, 6);
-testParquetFilterPushDown("t.`user`.hobby_ids[2] > 20", 5, 3);
-testParquetFilterPushDown("t.`user`.hobby_ids[0] between 10 and 20", 5, 4);
+testParquetFilterPushDown("t.`user`.hobby_ids[0] <> 1", 8, 7);
 
 Review comment:
   Why the expected output was changed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6460) Different hive-metastore jar files in accordance with the version of Hive

2018-06-04 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6460:
---

Assignee: Bohdan Kazydub

> Different hive-metastore jar files in accordance with the version of Hive
> -
>
> Key: DRILL-6460
> URL: https://issues.apache.org/jira/browse/DRILL-6460
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Hive metastore client can't guarantee the proper work with other versions of 
> Hive. So it should be exactly the same version of Hive metastore client and 
> server.
> In Spark the user can specify the path, where hive-metastore jar is placed. 
> For example, if spark works with Hive2.1.1 version, it is necessary to put 
> the path to hive-metastore-2.1.1.jar for the property 
> _*spark.sql.hive.metastore.jars*_ 
>  
> [http://spark.apache.org/docs/latest/sql-programming-guide.html#interacting-with-different-versions-of-hive-metastore]
> The similar way should be implemented in Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6453) TPC-DS query 72 has regressed

2018-06-04 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499858#comment-16499858
 ] 

Volodymyr Vysotskyi edited comment on DRILL-6453 at 6/4/18 8:12 AM:


[~khfaraaz], in Jira description, you have specified a large range of commits.

Could you please narrow it down and specify a concrete commit caused the 
regression?


was (Author: vvysotskyi):
[~khfaraaz], if this is a regression, could you please specify a commit, which 
caused it?

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-06-04 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499858#comment-16499858
 ] 

Volodymyr Vysotskyi commented on DRILL-6453:


[~khfaraaz], if this is a regression, could you please specify a commit, which 
caused it?

> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem

2018-06-04 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-4650:
---

Assignee: Charles Givre

>  Excel file (.xsl) and Microsoft Access file (.accdb) problem
> -
>
> Key: DRILL-4650
> URL: https://issues.apache.org/jira/browse/DRILL-4650
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.6.0
>Reporter: Sanjiv Kumar
>Assignee: Charles Givre
>Priority: Major
>
> I am trying to query from excel file(.xsl file) and ms access file (.accdb), 
> but i am unable to query from these files in drill. Is there any way to query 
> these files. Or any Storage Plugin for query these excel and ms access files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)