date:20180530

[jira] [Updated] (DRILL-6452) document steps to execute SQL queries from Postman (chrome extension) on Drill

2018-05-30 Thread Khurram Faraaz (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6452:
--
Description: 
We need documentation to list the steps with screen shots about executing SQL 
queries from Postman (chrome extension) on Drill.

Here are the steps to execute SQL queries from Postman

{noformat}
1. Install Postman extension for Chrome browser.
 To install Postman
https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en
 Then click on ADD TO CHROME button

2. On the top right of your Chrome browser window, click on the postman icon.

In your Postman:
3. set the type to “POST” and enter the request URL as “http://:8047/query.json”

4. In the Header tab, add an entry for “Content-Type” as key and 
“application/json” as value.
 Add another entry for “User-Name” as key and “mapr” as value

5. In the Body tab, select “raw” and a new dropdown list should appear next to 
“raw" and on the dropdown select “JSON”

6. And in the Body box, enter your request body in JSON format. The file 
test.csv is expected to reside under /tmp folder (i.e. in dfs.tmp schema)
{
“queryType”: “SQL”,
“query”: “select * from `dfs.tmp`.`test.csv`”
}

5. Press send!

{noformat}

  was:We need documentation to list the steps with screen shots about executing 
SQL queries from Postman (chrome extension) on Drill.


> document steps to execute SQL queries from Postman (chrome extension) on Drill
> --
>
> Key: DRILL-6452
> URL: https://issues.apache.org/jira/browse/DRILL-6452
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Minor
>
> We need documentation to list the steps with screen shots about executing SQL 
> queries from Postman (chrome extension) on Drill.
> Here are the steps to execute SQL queries from Postman
> {noformat}
> 1. Install Postman extension for Chrome browser.
>  To install Postman
> https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en
>  Then click on ADD TO CHROME button
> 2. On the top right of your Chrome browser window, click on the postman icon.
> In your Postman:
> 3. set the type to “POST” and enter the request URL as “http:// ip>:8047/query.json”
> 4. In the Header tab, add an entry for “Content-Type” as key and 
> “application/json” as value.
>  Add another entry for “User-Name” as key and “mapr” as value
> 5. In the Body tab, select “raw” and a new dropdown list should appear next 
> to “raw" and on the dropdown select “JSON”
> 6. And in the Body box, enter your request body in JSON format. The file 
> test.csv is expected to reside under /tmp folder (i.e. in dfs.tmp schema)
> {
> “queryType”: “SQL”,
> “query”: “select * from `dfs.tmp`.`test.csv`”
> }
> 5. Press send!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-05-30 Thread Vitalii Diravka (JIRA)

Vitalii Diravka created DRILL-6454:
--

 Summary: Native MapR DB plugin support for Hive MapR-DB json table
 Key: DRILL-6454
 URL: https://issues.apache.org/jira/browse/DRILL-6454
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - Hive, Storage - MapRDB
Affects Versions: 1.13.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.14.0


Hive can create and query MapR-DB tables via maprdb-json-handler.
The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
(similar to parquet).

Design proposal is:
- to implement new GroupScan operators for interpreting HiveScan as 
MapRDBGroupScan;
- to add storage planning rule to convert HiveScan to MapRDBGroupScan;
- to add system/session option to enable using of this native reader;
- to create a new module for the rule and scan operators, which will be 
compiled and build only for mapr profile (there is no reason to leverage it for 
default profile);



 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-05-30 Thread Vitalii Diravka (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6454:
---
Description: 
Hive can create and query MapR-DB tables via maprdb-json-handler:
https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html

The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
(similar to parquet).

Design proposal is:
- to implement new GroupScan operators for interpreting HiveScan as 
MapRDBGroupScan;
- to add storage planning rule to convert HiveScan to MapRDBGroupScan;
- to add system/session option to enable using of this native reader;
- to create a new module for the rule and scan operators, which will be 
compiled and build only for mapr profile (there is no reason to leverage it for 
default profile);



 

  was:
Hive can create and query MapR-DB tables via maprdb-json-handler.
The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
(similar to parquet).

Design proposal is:
- to implement new GroupScan operators for interpreting HiveScan as 
MapRDBGroupScan;
- to add storage planning rule to convert HiveScan to MapRDBGroupScan;
- to add system/session option to enable using of this native reader;
- to create a new module for the rule and scan operators, which will be 
compiled and build only for mapr profile (there is no reason to leverage it for 
default profile);



 


> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
> https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
> - to implement new GroupScan operators for interpreting HiveScan as 
> MapRDBGroupScan;
> - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
> - to add system/session option to enable using of this native reader;
> - to create a new module for the rule and scan operators, which will be 
> compiled and build only for mapr profile (there is no reason to leverage it 
> for default profile);
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6145) Implement using of Hive MapR-DB JSON handler.

2018-05-30 Thread Vitalii Diravka (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6145:
---
Summary: Implement using of Hive MapR-DB JSON handler.   (was: Implement 
Hive MapR-DB JSON handler. )

> Implement using of Hive MapR-DB JSON handler. 
> --
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6454:
-
Reviewer: Gautam Kumar Parai

> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
> https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
> - to implement new GroupScan operators for interpreting HiveScan as 
> MapRDBGroupScan;
> - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
> - to add system/session option to enable using of this native reader;
> - to create a new module for the rule and scan operators, which will be 
> compiled and build only for mapr profile (there is no reason to leverage it 
> for default profile);
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-05-30 Thread Pritesh Maker (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495163#comment-16495163
 ] 

Pritesh Maker commented on DRILL-6454:
--

[~amansinha100] , [~gparai] and [~HanumathRao] can you comment on the design 
proposal? Any concerns? 

[~vitalii] could you publish the Work In Progress branch to make the design 
proposal clear?

> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
> https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
> - to implement new GroupScan operators for interpreting HiveScan as 
> MapRDBGroupScan;
> - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
> - to add system/session option to enable using of this native reader;
> - to create a new module for the rule and scan operators, which will be 
> compiled and build only for mapr profile (there is no reason to leverage it 
> for default profile);
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4364) Image Metadata Format Plugin

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495211#comment-16495211
 ] 

ASF GitHub Bot commented on DRILL-4364:
---

cgivre commented on issue #367: DRILL-4364: Image Metadata Format Plugin
URL: https://github.com/apache/drill/pull/367#issuecomment-393177462
 
 
   +1 LGTM.  Built successfully and ran queries in embedded mode.  Thank you 
for your contribution to Drill!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Image Metadata Format Plugin
> 
>
> Key: DRILL-4364
> URL: https://issues.apache.org/jira/browse/DRILL-4364
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Support querying of metadata in various image formats. This plugin leverages 
> [metadata-extractor|https://github.com/drewnoakes/metadata-extractor]. This 
> plugin is especially useful when querying on a large number of image files 
> stored in a distributed file system without building metadata repository in 
> advance.
> This plugin supports the following file formats.
>  * JPEG, TIFF, PSD, PNG, BMP, GIF, ICO, PCX, WAV, AVI, WebP, MOV, MP4, EPS
>  * Camera Raw: ARW (Sony), CRW/CR2 (Canon), NEF (Nikon), ORF (Olympus), RAF 
> (FujiFilm), RW2 (Panasonic), RWL (Leica), SRW (Samsung), X3F (Foveon)
> This plugin enables to read the following metadata.
>  * Exif, IPTC, XMP, JFIF / JFXX, ICC Profiles, Photoshop fields, PNG 
> properties, BMP properties, GIF properties, ICO properties, PCX properties, 
> WAV properties, AVI properties, WebP properties, QuickTime properties, MP4 
> properties, EPS properties
> Since each type of metadata has a different set of fields, the plugin returns 
> a set of commonly-used fields such as the image width, height and bits per 
> pixels for ease of use.
> *Examples:*
> Querying on a JPEG file with the property descriptive: true
> {noformat}
> 0: jdbc:drill:zk=local> select FileName, * from 
> dfs.`4349313028_f69ffa0257_o.jpg`;
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | FileName | FileSize | FileDateTime | Format | PixelWidth | PixelHeight | 
> BitsPerPixel | DPIWidth | DPIHeight | Orientaion | ColorMode | HasAlpha | 
> Duration | VideoCodec | FrameRate | AudioCodec | AudioSampleSize | 
> AudioSampleRate | JPEG | JFIF | ExifIFD0 | ExifSubIFD | Interoperability | 
> GPS | ExifThumbnail | Photoshop | IPTC | Huffman | FileType |
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | 4349313028_f69ffa0257_o.jpg | 257213 bytes | Fri Mar 09 12:09:34 +08:00 
> 2018 | JPEG | 1199 | 800 | 24 | 96 | 96 | Unknown (0) | RGB | false | 
> 00:00:00 | Unknown | 0 | Unknown | 0 | 0 | 
> {"CompressionType":"Baseline","DataPrecision":"8 bits","ImageHeight":"800 
> pixels","ImageWidth":"1199 pixels","NumberOfComponents":"3","Component1":"Y 
> component: Quantization table 0, Sampling factors 2 horiz/2 
> vert","Component2":"Cb component: Quantization table 1, Sampling factors 1 
> horiz/1 vert","Component3":"Cr component: Quantization table 1, Sampling 
> factors 1 horiz/1 vert"} | 
> {"Version":"1.1","ResolutionUnits":"inch","XResolution":"96 
> dots","YResolution":"96 
> dots","ThumbnailWidthPixels":"0","ThumbnailHeightPixels":"0"} | 
> {"Software":"Picasa 3.0"} | 
> {"ExifVersion":"2.10","UniqueImageID":"d65e93b836d15a0c5e041e6b7258c76e"} | 
> {"InteroperabilityIndex":"Unknown ()","InteroperabilityVersion":"1.00"} | 
> {"GPSVersionID":".022","GPSLatitudeRef":"N","GPSLatitude":"47° 32' 
> 15.98\"","GPSLongitudeRef":"W","GPSLongitude":"-122° 2' 
> 6.37\"","GPSAltitudeRef":"Sea level","GPSAltitude":"0 metres"} | 
> {"Compression":"JPEG (old-style)","XResolution":"72 dots per 
> inch","YResolution":"72 dots per 
> inch","ResolutionUnit":"Inch","ThumbnailOffset":"414 
> bytes","ThumbnailLength":"7213 bytes"} | {} | 
>

[jira] [Updated] (DRILL-4364) Image Metadata Format Plugin

2018-05-30 Thread Charles Givre (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-4364:
-
Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Image Metadata Format Plugin
> 
>
> Key: DRILL-4364
> URL: https://issues.apache.org/jira/browse/DRILL-4364
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.14.0
>
>
> Support querying of metadata in various image formats. This plugin leverages 
> [metadata-extractor|https://github.com/drewnoakes/metadata-extractor]. This 
> plugin is especially useful when querying on a large number of image files 
> stored in a distributed file system without building metadata repository in 
> advance.
> This plugin supports the following file formats.
>  * JPEG, TIFF, PSD, PNG, BMP, GIF, ICO, PCX, WAV, AVI, WebP, MOV, MP4, EPS
>  * Camera Raw: ARW (Sony), CRW/CR2 (Canon), NEF (Nikon), ORF (Olympus), RAF 
> (FujiFilm), RW2 (Panasonic), RWL (Leica), SRW (Samsung), X3F (Foveon)
> This plugin enables to read the following metadata.
>  * Exif, IPTC, XMP, JFIF / JFXX, ICC Profiles, Photoshop fields, PNG 
> properties, BMP properties, GIF properties, ICO properties, PCX properties, 
> WAV properties, AVI properties, WebP properties, QuickTime properties, MP4 
> properties, EPS properties
> Since each type of metadata has a different set of fields, the plugin returns 
> a set of commonly-used fields such as the image width, height and bits per 
> pixels for ease of use.
> *Examples:*
> Querying on a JPEG file with the property descriptive: true
> {noformat}
> 0: jdbc:drill:zk=local> select FileName, * from 
> dfs.`4349313028_f69ffa0257_o.jpg`;
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | FileName | FileSize | FileDateTime | Format | PixelWidth | PixelHeight | 
> BitsPerPixel | DPIWidth | DPIHeight | Orientaion | ColorMode | HasAlpha | 
> Duration | VideoCodec | FrameRate | AudioCodec | AudioSampleSize | 
> AudioSampleRate | JPEG | JFIF | ExifIFD0 | ExifSubIFD | Interoperability | 
> GPS | ExifThumbnail | Photoshop | IPTC | Huffman | FileType |
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++--+-+---+---+--+-+--+
> | 4349313028_f69ffa0257_o.jpg | 257213 bytes | Fri Mar 09 12:09:34 +08:00 
> 2018 | JPEG | 1199 | 800 | 24 | 96 | 96 | Unknown (0) | RGB | false | 
> 00:00:00 | Unknown | 0 | Unknown | 0 | 0 | 
> {"CompressionType":"Baseline","DataPrecision":"8 bits","ImageHeight":"800 
> pixels","ImageWidth":"1199 pixels","NumberOfComponents":"3","Component1":"Y 
> component: Quantization table 0, Sampling factors 2 horiz/2 
> vert","Component2":"Cb component: Quantization table 1, Sampling factors 1 
> horiz/1 vert","Component3":"Cr component: Quantization table 1, Sampling 
> factors 1 horiz/1 vert"} | 
> {"Version":"1.1","ResolutionUnits":"inch","XResolution":"96 
> dots","YResolution":"96 
> dots","ThumbnailWidthPixels":"0","ThumbnailHeightPixels":"0"} | 
> {"Software":"Picasa 3.0"} | 
> {"ExifVersion":"2.10","UniqueImageID":"d65e93b836d15a0c5e041e6b7258c76e"} | 
> {"InteroperabilityIndex":"Unknown ()","InteroperabilityVersion":"1.00"} | 
> {"GPSVersionID":".022","GPSLatitudeRef":"N","GPSLatitude":"47° 32' 
> 15.98\"","GPSLongitudeRef":"W","GPSLongitude":"-122° 2' 
> 6.37\"","GPSAltitudeRef":"Sea level","GPSAltitude":"0 metres"} | 
> {"Compression":"JPEG (old-style)","XResolution":"72 dots per 
> inch","YResolution":"72 dots per 
> inch","ResolutionUnit":"Inch","ThumbnailOffset":"414 
> bytes","ThumbnailLength":"7213 bytes"} | {} | 
> {"Keywords":"135;2002;issaquah;police car;wa;washington"} | 
> {"NumberOfTables":"4 Huffman tables"} | 
> {"DetectedFileTypeName":"JPEG","DetectedFileTypeLongName":"Joint Photographic 
> Experts 
> Group","DetectedMIMEType":"image/jpeg","ExpectedFileNameExtension":"jpg"} |
> +--+--+--+++-+--+--+---++---+--+--++---++-+-+--+--+--++-

[jira] [Updated] (DRILL-6145) Enable usage of Hive MapR-DB JSON handler

2018-05-30 Thread Vlad Rozov (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6145:
--
Summary: Enable usage of Hive MapR-DB JSON handler  (was: Implement using 
of Hive MapR-DB JSON handler. )

> Enable usage of Hive MapR-DB JSON handler
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6416) Unit test TestTpchDistributedConcurrent.testConcurrentQueries fails with AssertionError

2018-05-30 Thread Vlad Rozov (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov resolved DRILL-6416.
---
   Resolution: Fixed
Fix Version/s: 1.14.0

> Unit test TestTpchDistributedConcurrent.testConcurrentQueries fails with 
> AssertionError
> ---
>
> Key: DRILL-6416
> URL: https://issues.apache.org/jira/browse/DRILL-6416
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Abhishek Girish
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> {code}
> Running org.apache.drill.TestTpchDistributedConcurrent#testConcurrentQueries
> 16:38:21.784 [2505e212-b165-7812-5c91-0a407a213964:frag:3:1] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: AssertionError
> Fragment 3:1
> [Error Id: 436120b6-5255-437e-af53-313e1c3207e0 on drillu1.qa.lab:31064]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError
> Fragment 3:1
> [Error Id: 436120b6-5255-437e-af53-313e1c3207e0 on drillu1.qa.lab:31064]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:359)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:214)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:325)
>  [classes/:na]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.RuntimeException: java.lang.AssertionError
>   at 
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
>  ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:471)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:313)
>  [classes/:na]
>   ... 4 common frames omitted
> Caused by: java.lang.AssertionError: null
>   at 
> org.apache.drill.exec.compile.sig.MappingSet.enterConstant(MappingSet.java:85)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanConstant(EvaluationVisitor.java:1376)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanConstant(EvaluationVisitor.java:1043)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanConstant(EvaluationVisitor.java:843)
>  ~[classes/:na]
>   at 
> org.apache.drill.common.expression.ValueExpressions$BooleanExpression.accept(ValueExpressions.java:186)
>  ~[drill-logical-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression(EvaluationVisitor.java:579)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown(EvaluationVisitor.java:342)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown(EvaluationVisitor.java:1399)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown(EvaluationVisitor.java:1084)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown(EvaluationVisitor.java:843)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept(ReturnValueExpression.java:56)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.EvaluationVisitor.addExpr(EvaluationVisitor.java:100)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.expr.ClassGenerator.addExpr(ClassGenerator.java:334) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.join.NestedLoopJoinBatch.setupWorker(NestedLoopJoinBatch.java:266)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.join.NestedLoopJoinBatch.buildSchema(NestedLoopJoinBatch.java:384)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:144)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:229)
>  ~[classe

[jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495298#comment-16495298
 ] 

ASF GitHub Bot commented on DRILL-5977:
---

akumarb2010 commented on issue #1272: DRILL-5977: Filter Pushdown in 
Drill-Kafka plugin
URL: https://github.com/apache/drill/pull/1272#issuecomment-393205670
 
 
   @aravi5  & @kkhatua  
   
   In production environments, most of the scenarios, I observed non uniform 
offset ranges(cases where repartition etc). But, If handling this scenario 
needs complex logic, then we can go ahead with current design. 
   
   I just started going through the code, will update my review comments by 
this weekend, if there is any.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> predicate pushdown support kafkaMsgOffset
> -
>
> Key: DRILL-5977
> URL: https://issues.apache.org/jira/browse/DRILL-5977
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: B Anil Kumar
>Assignee: Abhishek Ravi
>Priority: Major
> Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting 
> point or a count? Perhaps I want to run my query every five minutes, scanning 
> only those messages since the previous scan. Or, I want to limit my take to, 
> say, the next 1000 messages. Could we use a pseudo-column such as 
> "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM  WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-05-30 Thread Kunal Khatua (JIRA)

Kunal Khatua created DRILL-6455:
---

 Summary: JDBC Scan Operator does not appear in profile
 Key: DRILL-6455
 URL: https://issues.apache.org/jira/browse/DRILL-6455
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.13.0
Reporter: Kunal Khatua
Assignee: Kunal Khatua
 Fix For: 1.14.0


It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6415) Unit test TestGracefulShutdown.testRestApiShutdown times out

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495505#comment-16495505
 ] 

ASF GitHub Bot commented on DRILL-6415:
---

parthchandra closed pull request #1281: DRILL-6415: Fixed 
TestGracefulShutdown.TestRestApi test from timing out
URL: https://github.com/apache/drill/pull/1281
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java 
b/exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java
index ccd65e41fe..bec1691078 100644
--- 
a/exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java
+++ 
b/exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java
@@ -20,7 +20,6 @@
 import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
 import org.apache.drill.exec.server.Drillbit;
-import org.apache.drill.exec.work.WorkManager;
 import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Rule;
@@ -28,15 +27,14 @@
 import org.junit.experimental.categories.Category;
 import org.junit.rules.TestRule;
 
+import java.io.BufferedWriter;
 import java.io.FileWriter;
 import java.io.IOException;
 import java.io.PrintWriter;
 import java.net.HttpURLConnection;
 import java.net.URL;
-import java.util.Collection;
-import java.util.Properties;
 import java.nio.file.Path;
-import java.io.BufferedWriter;
+import java.util.Collection;
 
 import static org.junit.Assert.fail;
 
@@ -44,9 +42,7 @@
 public class TestGracefulShutdown extends BaseTestQuery {
 
   @Rule
-  public final TestRule TIMEOUT = TestTools.getTimeoutRule(180_000);
-
-  public static final int WAIT_TIMEOUT_MS = WorkManager.EXIT_TIMEOUT_MS + 
30_000;
+  public final TestRule TIMEOUT = TestTools.getTimeoutRule(120_000);
 
   @BeforeClass
   public static void setUpTestData() throws Exception {
@@ -55,37 +51,17 @@ public static void setUpTestData() throws Exception {
 }
   }
 
-  public static final Properties WEBSERVER_CONFIGURATION = new Properties() {
-{
-  put(ExecConstants.HTTP_ENABLE, true);
-  put(ExecConstants.HTTP_PORT_HUNT, true);
-  put(ExecConstants.DRILL_PORT_HUNT, true);
-  put(ExecConstants.GRACE_PERIOD, 1);
-  put(ExecConstants.ALLOW_LOOPBACK_ADDRESS_BINDING, true);
-}
-  };
-
-  public static final Properties DRILL_PORT_CONFIGURATION = new Properties() {
-{
-  put(ExecConstants.DRILL_PORT_HUNT, true);
-  put(ExecConstants.GRACE_PERIOD, 1);
-  put(ExecConstants.ALLOW_LOOPBACK_ADDRESS_BINDING, true);
-}
-  };
-
-  public ClusterFixtureBuilder enableWebServer(ClusterFixtureBuilder builder) {
-Properties props = new Properties();
-props.putAll(WEBSERVER_CONFIGURATION);
-builder.configBuilder.configProps(props);
+  private static void enableWebServer(ClusterFixtureBuilder builder) {
+enableDrillPortHunting(builder);
+builder.configBuilder.put(ExecConstants.HTTP_ENABLE, true);
+builder.configBuilder.put(ExecConstants.HTTP_PORT_HUNT, true);
 builder.sessionOption(ExecConstants.SLICE_TARGET, 10);
-return builder;
   }
 
-  public ClusterFixtureBuilder enableDrillPortHunting(ClusterFixtureBuilder 
builder) {
-Properties props = new Properties();
-props.putAll(DRILL_PORT_CONFIGURATION);
-builder.configBuilder.configProps(props);
-return builder;
+  private static void enableDrillPortHunting(ClusterFixtureBuilder builder) {
+builder.configBuilder.put(ExecConstants.DRILL_PORT_HUNT, true);
+builder.configBuilder.put(ExecConstants.GRACE_PERIOD, 500);
+builder.configBuilder.put(ExecConstants.ALLOW_LOOPBACK_ADDRESS_BINDING, 
true);
   }
 
   /*
@@ -95,98 +71,30 @@ public ClusterFixtureBuilder 
enableDrillPortHunting(ClusterFixtureBuilder builde
   @Test
   public void testOnlineEndPoints() throws  Exception {
 
-String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+String[] drillbits = {"db1" ,"db2","db3"};
 ClusterFixtureBuilder builder = 
ClusterFixture.bareBuilder(dirTestWatcher).withLocalZk().withBits(drillbits);
 enableDrillPortHunting(builder);
 
 try ( ClusterFixture cluster = builder.build()) {
 
   Drillbit drillbit = cluster.drillbit("db2");
+  int zkRefresh = 
drillbit.getContext().getConfig().getInt(ExecConstants.ZK_REFRESH);
   DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
-  int grace_period = 
drillbit.getContext().getConfig().getInt(ExecConstants.GRACE_PERIOD);
-
-  new Thread(new Runnable() {
-public void run() {
-  try {
-cluster.closeDrillb

[jira] [Commented] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495522#comment-16495522
 ] 

ASF GitHub Bot commented on DRILL-6455:
---

kkhatua opened a new pull request #1297: DRILL-6455: Add missing JDBC Scan 
Operator for profiles
URL: https://github.com/apache/drill/pull/1297
 
 
   The operator is missing in the profile protobuf. This commit introduces that.
   1. Added protobuf files
   2. Updated JdbcSubScan's getOperatorType API


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-05-30 Thread Kunal Khatua (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6455:

Reviewer: Aman Sinha

> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6445) Fix existing test cases in TestScripts.java and add new test case for DRILLBIT_CONTEXT variable

2018-05-30 Thread Timothy Farkas (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6445:
--
Labels: ready-to-commit  (was: )

> Fix existing test cases in TestScripts.java and add new test case for 
> DRILLBIT_CONTEXT variable
> ---
>
> Key: DRILL-6445
> URL: https://issues.apache.org/jira/browse/DRILL-6445
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Under drill-yarn module there is [TestScripts.java 
> file|https://github.com/apache/drill/blob/master/drill-yarn/src/test/java/org/apache/drill/yarn/scripts/TestScripts.java]
>  created for testing the scripts provided by Drill to setup the environment. 
> Currently those tests are failing. This Jira is to make sure all the tests 
> are passing and few new tests are added for DRILLBIT_CONTEXT variable inside 
> the script.
> Also currently these tests are ignored and meant to be run in Developer 
> environment. May be we can investigate in future to provide support for it to 
> be run as regular tests by using DirTestWatcher



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495537#comment-16495537
 ] 

ASF GitHub Bot commented on DRILL-6455:
---

kkhatua commented on issue #1297: DRILL-6455: Add missing JDBC Scan Operator 
for profiles
URL: https://github.com/apache/drill/pull/1297#issuecomment-393267860
 
 
   @amansinha100  can you please review these?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6356) batch sizing for union all

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495591#comment-16495591
 ] 

ASF GitHub Bot commented on DRILL-6356:
---

Ben-Zvi closed pull request #1255: DRILL-6356: batch sizing for union all
URL: https://github.com/apache/drill/pull/1255
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/ops/OperatorMetricRegistry.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/ops/OperatorMetricRegistry.java
index d731ca4c24..dcb944512c 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/ops/OperatorMetricRegistry.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/ops/OperatorMetricRegistry.java
@@ -29,7 +29,7 @@
 import 
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch;
 import org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch;
 import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
-import org.apache.drill.exec.record.JoinBatchMemoryManager;
+import org.apache.drill.exec.record.AbstractBinaryRecordBatch;
 import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader;
 
 /**
@@ -53,9 +53,10 @@
 register(CoreOperatorType.EXTERNAL_SORT_VALUE, 
ExternalSortBatch.Metric.class);
 register(CoreOperatorType.PARQUET_ROW_GROUP_SCAN_VALUE, 
ParquetRecordReader.Metric.class);
 register(CoreOperatorType.FLATTEN_VALUE, FlattenRecordBatch.Metric.class);
-register(CoreOperatorType.MERGE_JOIN_VALUE, 
JoinBatchMemoryManager.Metric.class);
-register(CoreOperatorType.LATERAL_JOIN_VALUE, 
JoinBatchMemoryManager.Metric.class);
+register(CoreOperatorType.MERGE_JOIN_VALUE, 
AbstractBinaryRecordBatch.Metric.class);
+register(CoreOperatorType.LATERAL_JOIN_VALUE, 
AbstractBinaryRecordBatch.Metric.class);
 register(CoreOperatorType.UNNEST_VALUE, UnnestRecordBatch.Metric.class);
+register(CoreOperatorType.UNION_VALUE, 
AbstractBinaryRecordBatch.Metric.class);
   }
 
   private static void register(final int operatorType, final Class metricDef) {
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java
index b4d0e7726d..f4c1900a5f 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java
@@ -27,6 +27,7 @@
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.types.TypeProtos;
 import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.exception.ClassTransformationException;
 import org.apache.drill.exec.exception.OutOfMemoryException;
 import org.apache.drill.exec.exception.SchemaChangeException;
@@ -40,6 +41,7 @@
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.RecordBatchMemoryManager;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessibleUtilities;
@@ -68,6 +70,10 @@
 
   public UnionAllRecordBatch(UnionAll config, List children, 
FragmentContext context) throws OutOfMemoryException {
 super(config, context, true, children.get(0), children.get(1));
+
+// get the output batch size from config.
+int configuredBatchSize = (int) 
context.getOptions().getOption(ExecConstants.OUTPUT_BATCH_SIZE_VALIDATOR);
+batchMemoryManager = new RecordBatchMemoryManager(numInputs, 
configuredBatchSize);
   }
 
   @Override
@@ -106,9 +112,9 @@ public IterOutcome innerNext() {
   return IterOutcome.NONE;
 }
 
-Pair nextBatch = unionInputIterator.next();
+Pair nextBatch = 
unionInputIterator.next();
 IterOutcome upstream = nextBatch.left;
-RecordBatch incoming = nextBatch.right;
+BatchStatusWrappper batchStatus = nextBatch.right;
 
 switch (upstream) {
 case NONE:
@@ -116,14 +122,14 @@ public IterOutcome innerNext() {
 case STOP:
   return upstream;
 case OK_NEW_SCHEMA:
-  return doWork(nextBatch.right, true);
+  return doWork(batchStatus, true);
 case OK:
   // skip batches with same schema as the previous one yet having 0 
row.
-  if (incoming.getRecordCount() == 0) {
-VectorAccessibleUtilities.clear(incoming);
+

[jira] [Resolved] (DRILL-6402) Repeated Value Vectors copyFrom methods are not updating the value count and writer index correctly for values vector

2018-05-30 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-6402.
-
Resolution: Fixed

commit 31be83ebf5c58ca9ce3862369b4a828589f496f8

as part of PR #1255 

 

> Repeated Value Vectors copyFrom methods are not updating the value count and 
> writer index correctly for values vector
> -
>
> Key: DRILL-6402
> URL: https://issues.apache.org/jira/browse/DRILL-6402
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> copyFrom and copyFromSafe methods of repeated value vectors do not update the 
> value count after values are copied. We update before the copy starts. Offset 
> vector value count is updated correctly. But, values vector value count is 
> not updated. So, writer index for values vector will have wrong value and we 
> get index out of bounds error when trying to read after copy. This problem is 
> seen when we do split and transfer of repeated value vector. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5929) Misleading error for text file with blank line delimiter

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5929:
-
Fix Version/s: (was: 1.14.0)

> Misleading error for text file with blank line delimiter
> 
>
> Key: DRILL-5929
> URL: https://issues.apache.org/jira/browse/DRILL-5929
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Consider the following functional test query:
> {code}
> select * from 
> table(`table_function/colons.txt`(type=>'text',lineDelimiter=>'\\'))
> {code}
> For some reason (yet to be determined), when running this from Java, the line 
> delimiter ended up empty. This cases the following line to fail with an 
> {{ArrayIndexOutOfBoundsException}}:
> {code}
> class TextInput ...
>   public final byte nextChar() throws IOException {
> if (byteChar == lineSeparator[0]) { // but, lineSeparator.length == 0
> {code}
> We then translate the exception:
> {code}
> class TextReader ...
>   public final boolean parseNext() throws IOException {
> ...
> } catch (Exception ex) {
>   try {
> throw handleException(ex);
> ...
>   private TextParsingException handleException(Exception ex) throws 
> IOException {
> ...
> if (ex instanceof ArrayIndexOutOfBoundsException) {
>   // Not clear this exception is still thrown...
>   ex = UserException
>   .dataReadError(ex)
>   .message(
>   "Drill failed to read your text file.  Drill supports up to %d 
> columns in a text file.  Your file appears to have more than that.",
>   MAXIMUM_NUMBER_COLUMNS)
>   .build(logger);
> }
> {code}
> That is, due to a missing delimiter, we get an index out of bounds exception, 
> which we translate to an error about having too many fields. But, the file 
> itself has only a handful of fields. Thus, the error is completely wrong.
> Then, we compound the error:
> {code}
>   private TextParsingException handleException(Exception ex) throws 
> IOException {
> ...
> throw new TextParsingException(context, message, ex);
> class CompliantTextReader ...
>   public boolean next() {
> ...
> } catch (IOException | TextParsingException e) {
>   throw UserException.dataReadError(e)
>   .addContext("Failure while reading file %s. Happened at or shortly 
> before byte position %d.",
> split.getPath(), reader.getPos())
>   .build(logger);
> {code}
> That is, our AIOB exception became a user exception that became a text 
> parsing exception that became a data read error.
> But, this is not a data read error. It is an error in Drill's own validation 
> logic. Not clear we should be wrapping user exceptions in other errors that 
> we wrap in other user exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5805) External Sort runs out of memory

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5805:
-
Fix Version/s: (was: 1.14.0)

> External Sort runs out of memory
> 
>
> Key: DRILL-5805
> URL: https://issues.apache.org/jira/browse/DRILL-5805
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 2645d135-4222-d752-2609-c95568ff6e93.sys.drill, 
> drillbit.log.gz
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 5;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 100;
> select count(*) from (select * from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
> d.str) d1 where d1.id=0;
> {noformat}
> Error is:
> {noformat}
> java.sql.SQLException: RESOURCE ERROR: Unable to allocate sv2 buffer
> Fragment 0:0
> [Error Id: d67e087f-30e3-4861-8d3a-ddd952ddacc1 on atsqa6c83.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> sv2 buffer
> 
> org.apache.drill.exec.physical.impl.xsort.managed.BufferedBatches.newSV2():157
> 
> org.apache.drill.exec.physical.impl.xsort.managed.BufferedBatches.makeSelectionVector():142
> org.apache.drill.exec.physical.impl.xsort.managed.BufferedBatches.add():97
> org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.addBatch():265
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():422
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():358
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():303
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():151
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>

[jira] [Updated] (DRILL-6313) ScanBatch.Mutator does not report new schema for empty first batch

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6313:
-
Fix Version/s: (was: 1.14.0)

> ScanBatch.Mutator does not report new schema for empty first batch
> --
>
> Key: DRILL-6313
> URL: https://issues.apache.org/jira/browse/DRILL-6313
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Create a format plugin that honors an empty select list by returning no 
> columns. This case occurs in a {{COUNT(\*)}} query.
> When run, the query fails with:
> {noformat}
> SYSTEM ERROR: IllegalStateException: next() returned OK without first 
> returning OK_NEW_SCHEMA [#2, ScanBatch]
> {noformat}
> The reason is that the {{Mutator}} class uses a flag, {{schemaChanged}}, 
> which defaults to {{schemaChanged}}. It is set to {{true}} only when a field 
> is added. But, since the query requested no fields, no field is added.
> The fix is simple, just default {{schemaChanged}} to {{true}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5778) Drill seems to run out of memory but completes execution

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5778:
-
Fix Version/s: (was: 1.14.0)

> Drill seems to run out of memory but completes execution
> 
>
> Key: DRILL-5778
> URL: https://issues.apache.org/jira/browse/DRILL-5778
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill, 
> drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> select count(*) from (select * from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
> d.str) d1 where d1.id=0;
> {noformat}
> Plan is:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02  Project($f0=[0])
> 01-03SelectionVectorRemover
> 01-04  Filter(condition=[=($0, 0)])
> 01-05SingleMergeExchange(sort0=[1 ASC])
> 02-01  SelectionVectorRemover
> 02-02Sort(sort0=[$1], dir0=[ASC])
> 02-03  Project(id=[$0], str=[$1])
> 02-04HashToRandomExchange(dist0=[[$1]])
> 03-01  UnorderedMuxExchange
> 04-01Project(id=[$0], str=[$1], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
> 04-02  Flatten(flattenField=[$1])
> 04-03Project(id=[$0], str=[$1])
> 04-04  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
>  numFiles=1, columns=[`id`, `str_list`], 
> files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
> {noformat}
> From drillbit.log:
> {noformat}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
>   str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, 
> data size: 548360)
>   id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
> size: 36864)
>   Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
> 262163, Net row width: 143, Density: 1}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
> Incoming batch size: 1073819648, available memory: 2147483648
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO  
> o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug 
> option: true
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):
> ...
> 2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.exec.compile.ClassTransformer - Compiled and merged 
> SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 
> bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, 
> gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, 
> gross = 14057257 bytes, records = 65535
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer 
> memory = 2143289744, merge memory = 2128740638
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSort

[jira] [Updated] (DRILL-5988) Revise operator stats for OperatorFixture

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5988:
-
Fix Version/s: (was: 1.14.0)

> Revise operator stats for OperatorFixture
> -
>
> Key: DRILL-5988
> URL: https://issues.apache.org/jira/browse/DRILL-5988
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> DRILL-5842 refactored operator contexts to simplify unit testing. This ticket 
> continues that work by modifying the standard {{OperatorStats}} so it can be 
> used in the {{OperatorFixture}} used for "sub-operator" unit tests.
> This ticket also includes many small code cleanups so that the next "batch 
> size control" PR can focus just on the core functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6360) Document the typeof() function

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6360:
-
Issue Type: Task  (was: Improvement)

> Document the typeof() function
> --
>
> Key: DRILL-6360
> URL: https://issues.apache.org/jira/browse/DRILL-6360
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
>
> Drill has a {{typeof()}} function that returns the data type (but not mode) 
> of a column. It was discussed on the dev list recently. However, a search of 
> the Drill web site, and a scan by hand, failed to turn up documentation about 
> the function.
> As a general suggestion, would be great to have an alphabetical list of all 
> functions so we don't have to hunt all over the site to find which functions 
> are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6360) Document the typeof() function

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6360:
-
Fix Version/s: (was: 1.14.0)

> Document the typeof() function
> --
>
> Key: DRILL-6360
> URL: https://issues.apache.org/jira/browse/DRILL-6360
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
>
> Drill has a {{typeof()}} function that returns the data type (but not mode) 
> of a column. It was discussed on the dev list recently. However, a search of 
> the Drill web site, and a scan by hand, failed to turn up documentation about 
> the function.
> As a general suggestion, would be great to have an alphabetical list of all 
> functions so we don't have to hunt all over the site to find which functions 
> are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6060) JDBC-all excludes files required for date/time vectors

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6060:
-
Fix Version/s: (was: 1.14.0)

> JDBC-all excludes files required for date/time vectors
> --
>
> Key: DRILL-6060
> URL: https://issues.apache.org/jira/browse/DRILL-6060
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The vector package contains the file 
> {{org.apache.drill.exec.expr.fn.impl.DateUtility}}. It contains formatting 
> code along with a set of date constants (such as the number of hours in a 
> day.) The date constants are used in the generated value vector code, such as 
> for the {{IntervalVector}} class:
> {code}
> public StringBuilder getAsStringBuilder(int index) {
>   ...
>   final int years  = (months / 
> org.apache.drill.exec.expr.fn.impl.DateUtility.yearsToMonths);
>   months = (months % 
> org.apache.drill.exec.expr.fn.impl.DateUtility.yearsToMonths);
> {code}
> Thus, the {{DateUtility}} class is required in order for the date/time 
> vectors to work.
> Yet, recent changes to the JDBC driver now excludes the package that contains 
> the {{DateUtility}} class. In {{dependency-reduced-pom.xml}}:
> {code}
> org/apache/drill/exec/expr/fn/**
> {code}
> A refactoring exercised moved more of the date/time code out of generated 
> code and into the {{DateUtility}} class, so that the code can be reused. The 
> result are runtime errors in unit tests.
> {noformat}
> Caused by: java.lang.NoClassDefFoundError: 
> oadd/org/apache/drill/exec/expr/fn/impl/DateUtility
>   at 
> oadd.org.apache.drill.exec.vector.IntervalDayVector$Accessor.getObject(IntervalDayVector.java:450)
>   at 
> oadd.org.apache.drill.exec.vector.accessor.IntervalDayAccessor.getObject(IntervalDayAccessor.java:125)
> {noformat}
> Since the intent is to exclude functions only needed by Drill execution, the 
> solution is to move the code required by vectors out of the {{fn}} package. 
> The safe bet is to put it in the {{org.apache.drill.exec.vector}} which can't 
> be excluded (it includes the value vector code.)
> The larger issue is that the concept of excluding bits of Drill code is 
> problematic: there is no good way to ensure that the code is not needed.
> The traditional (and reliable) solution is to design a client-only library 
> which is designed to include only the required dependences. Build up the list 
> of dependencies from zero (as is common practice in Maven) rather than try to 
> add things then throw them overboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-3202) Count(*) fails on JSON wrapped up in single array - JSON parsing error

2018-05-30 Thread Adrian-Bogdan Ionescu (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495641#comment-16495641
 ] 

Adrian-Bogdan Ionescu commented on DRILL-3202:
--

You can use SELECT COUNT([ALL] ) FROM dfs.`` to get 
desired result.

Tested in Drill 1.13.0

> Count(*) fails on JSON wrapped up in single array - JSON parsing error
> --
>
> Key: DRILL-3202
> URL: https://issues.apache.org/jira/browse/DRILL-3202
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Neeraja
>Assignee: Steven Phillips
>Priority: Major
> Fix For: Future
>
> Attachments: DRILL-3202.patch
>
>
> I have a JSON document as follows.
> [
> {
> "Category": "1,2",
> "Comments": "Total sites: 20, RV sites: 20, Elec sites: 20, Water at 
> site, RV Dump, Showers, Flush Toilets, RV Fee: $14, Tent Fee: $14, Elev: 
> 545', Tel: 256-577-9619, Nearest town: Muscle Shoals",
> "Latitude": "34.800446",
> "Longitude": "-87.498242",
> "Name": "Alloys Co Park",
> "State": "AL",
> "Type": "cp",
> "URL": 
> "http://www.campingroadtrip.com/campgrounds/campground/campground/23478/alabama/colbert-county-alloys-park-campground";
> }
> ]
> Drill has ability to unwrap the array (without user specifying it) and 
> perform some SQL operations on it. However count(*) specifically fails on 
> these documents.
> 0: jdbc:drill:zk=local> select * from 
> dfs.`default`.`/Users/nrentachintala/Downloads/yelp/uspointsofinterestshort.json`
>  limit 10;
> +---++---++---++---+--+
> | Category  | 
>Comments   
>  | Latitude  | Longitude  | Name  | 
> State  | Type  | URL  |
> +---++---++---++---+--+
> | 1,2 | Total sites: 20, RV sites: 20, Elec sites: 20, Water at site, RV 
> Dump, Showers, Flush Toilets, RV Fee: $14, Tent Fee: $14, Elev: 545', Tel: 
> 256-577-9619, Nearest town: Muscle Shoals | 34.800446 | -87.498242 | Alloys 
> Co Park | AL | cp | 
> http://www.campingroadtrip.com/campgrounds/campground/campground/23478/alabama/colbert-county-alloys-park-campground
>  |
> +---++---++---++---+--+
> 1 row selected (0.197 seconds)
> 0: jdbc:drill:zk=local> select distinct type from 
> dfs.`default`.`/Users/nrentachintala/Downloads/yelp/uspointsofinterestshort.json`
>  limit 10;
> +---+
> | type  |
> +---+
> | cp|
> +---+
> 1 row selected (0.193 seconds)
> 0: jdbc:drill:zk=local> 
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`default`.`/Users/nrentachintala/Downloads/yelp/uspointsofinterestshort.json`
>  limit 10;
> Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a 
> record. Current token was START_ARRAY
> File  /Users/nrentachintala/Downloads/yelp/uspointsofinterestshort.json
> Record  1
> Fragment 0:0
> [Error Id: 4742f738-1d43-4fef-af48-110065c9dd83 on 172.16.1.82:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6058) Define per connection level OptionManager

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6058:
-
Fix Version/s: (was: 1.14.0)

> Define per connection level OptionManager
> -
>
> Key: DRILL-6058
> URL: https://issues.apache.org/jira/browse/DRILL-6058
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Minor
>
> We want to control some queries's running frequency which need to be 
> identified by some options .
> One requirement case is we want to control the download query times, but 
> allow the normal queries to run. So we need to define a connection level 
> OptionManager .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5774) Excessive memory allocation

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5774:
-
Fix Version/s: (was: 1.14.0)

> Excessive memory allocation
> ---
>
> Key: DRILL-5774
> URL: https://issues.apache.org/jira/browse/DRILL-5774
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
>Priority: Major
>
> This query exhibits excessive memory allocation:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> select count(*) from (select * from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
> d.str) d1 where d1.id=0;
> {noformat}
> This query does a flatten on a large table.  The result is 160M records.  
> Half the records have a one-byte string, and half have a 253-byte string.  
> And then there are 40K records with 223 byte strings.
> {noformat}
> select length(str), count(*) from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) group by 
> length(str);
> +-+---+
> | EXPR$0  |  EXPR$1   |
> +-+---+
> | 223 | 4 |
> | 1   | 80042001  |
> | 253 | 8000  |
> {noformat}
> From the drillbit.log:
> {noformat}
> 2017-09-02 11:43:44,598 [26550427-6adf-a52e-2ea8-dc52d8d8433f:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
>   str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, 
> data size: 548360)
>   id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
> size: 36864)
>   Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
> 262163, Net row width: 143, Density: 1}
> {noformat}
> The data size is 585K, but the batch size is 1 GB.  The density is 1%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6310) limit batch size for hash aggregate

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker resolved DRILL-6310.
--
Resolution: Duplicate

> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6452) document steps to execute SQL queries from Postman (chrome extension) on Drill

2018-05-30 Thread Bridget Bevens (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495649#comment-16495649
 ] 

Bridget Bevens commented on DRILL-6452:
---

Hey [~khfaraaz], I've edited your Google doc. Can you please decline or accept 
my changes in the Google doc? Once you do that I can copy the content into a 
MarkDown (.md) file and post it to the Apache Drill site. 

Thanks,
Bridget

> document steps to execute SQL queries from Postman (chrome extension) on Drill
> --
>
> Key: DRILL-6452
> URL: https://issues.apache.org/jira/browse/DRILL-6452
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Minor
>
> We need documentation to list the steps with screen shots about executing SQL 
> queries from Postman (chrome extension) on Drill.
> Here are the steps to execute SQL queries from Postman
> {noformat}
> 1. Install Postman extension for Chrome browser.
>  To install Postman
> https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en
>  Then click on ADD TO CHROME button
> 2. On the top right of your Chrome browser window, click on the postman icon.
> In your Postman:
> 3. set the type to “POST” and enter the request URL as “http:// ip>:8047/query.json”
> 4. In the Header tab, add an entry for “Content-Type” as key and 
> “application/json” as value.
>  Add another entry for “User-Name” as key and “mapr” as value
> 5. In the Body tab, select “raw” and a new dropdown list should appear next 
> to “raw" and on the dropdown select “JSON”
> 6. And in the Body box, enter your request body in JSON format. The file 
> test.csv is expected to reside under /tmp folder (i.e. in dfs.tmp schema)
> {
> “queryType”: “SQL”,
> “query”: “select * from `dfs.tmp`.`test.csv`”
> }
> 5. Press send!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6113) Limit batch size for Merge Receiver

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6113:
-
Fix Version/s: (was: 1.14.0)

> Limit batch size for Merge Receiver
> ---
>
> Key: DRILL-6113
> URL: https://issues.apache.org/jira/browse/DRILL-6113
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> Merge receiver has hard coded limit of 32K rows as batch. Since rows can be 
> of varying width, it is difficult to predict the output batch size (in terms 
> of memory) for this operator. Change this to derive row count based on actual 
> memory available. We are introducing a new option called outputBatchSize to 
> limit the batch size of each operator. Use the memory configured from that. 
> Figure out the average row width of outgoing batch based on averages of 
> batches coming from incoming streams. Limit the row count based on memory 
> available to use and average row width.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6375) ANY_VALUE aggregate function

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495659#comment-16495659
 ] 

ASF GitHub Bot commented on DRILL-6375:
---

gparai commented on a change in pull request #1256: DRILL-6375 : Support for 
ANY_VALUE aggregate function
URL: https://github.com/apache/drill/pull/1256#discussion_r191919998
 
 

 ##
 File path: exec/java-exec/src/main/codegen/data/AggrTypes1.tdd
 ##
 @@ -88,6 +88,52 @@
   {inputType: "Interval", outputType: "NullableInterval", runningType: 
"Interval", major: "Date", initialValue: "0"},
   {inputType: "NullableInterval", outputType: "NullableInterval", 
runningType: "Interval", major: "Date", initialValue: "0"}
  ]
+   },
+   {className: "AnyValue", funcName: "any_value", types: [
+   {inputType: "Bit", outputType: "NullableBit", runningType: "Bit", 
major: "Numeric"},
+   {inputType: "Int", outputType: "NullableInt", runningType: "Int", 
major: "Numeric"},
+   {inputType: "BigInt", outputType: "NullableBigInt", runningType: 
"BigInt", major: "Numeric"},
+   {inputType: "NullableBit", outputType: "NullableBit", runningType: 
"Bit", major: "Numeric"},
+   {inputType: "NullableInt", outputType: "NullableInt", runningType: 
"Int", major: "Numeric"},
+   {inputType: "NullableBigInt", outputType: "NullableBigInt", 
runningType: "BigInt", major: "Numeric"},
+   {inputType: "Float4", outputType: "NullableFloat4", runningType: 
"Float4", major: "Numeric"},
+   {inputType: "Float8", outputType: "NullableFloat8", runningType: 
"Float8", major: "Numeric"},
+   {inputType: "NullableFloat4", outputType: "NullableFloat4", 
runningType: "Float4", major: "Numeric"},
+   {inputType: "NullableFloat8", outputType: "NullableFloat8", 
runningType: "Float8", major: "Numeric"},
+   {inputType: "Date", outputType: "NullableDate", runningType: "Date", 
major: "Date", initialValue: "0"},
+   {inputType: "NullableDate", outputType: "NullableDate", runningType: 
"Date", major: "Date", initialValue: "0"},
+   {inputType: "TimeStamp", outputType: "NullableTimeStamp", runningType: 
"TimeStamp", major: "Date", initialValue: "0"},
+   {inputType: "NullableTimeStamp", outputType: "NullableTimeStamp", 
runningType: "TimeStamp", major: "Date", initialValue: "0"},
+   {inputType: "Time", outputType: "NullableTime", runningType: "Time", 
major: "Date", initialValue: "0"},
+   {inputType: "NullableTime", outputType: "NullableTime", runningType: 
"Time", major: "Date", initialValue: "0"},
+   {inputType: "IntervalDay", outputType: "NullableIntervalDay", 
runningType: "IntervalDay", major: "Date", initialValue: "0"},
+   {inputType: "NullableIntervalDay", outputType: "NullableIntervalDay", 
runningType: "IntervalDay", major: "Date", initialValue: "0"},
+   {inputType: "IntervalYear", outputType: "NullableIntervalYear", 
runningType: "IntervalYear", major: "Date", initialValue: "0"},
+   {inputType: "NullableIntervalYear", outputType: "NullableIntervalYear", 
runningType: "IntervalYear", major: "Date", initialValue: "0"},
+   {inputType: "Interval", outputType: "NullableInterval", runningType: 
"Interval", major: "Date", initialValue: "0"},
+   {inputType: "NullableInterval", outputType: "NullableInterval", 
runningType: "Interval", major: "Date", initialValue: "0"},
+   {inputType: "VarChar", outputType: "NullableVarChar", runningType: 
"VarChar", major: "VarBytes", initialValue: ""},
+   {inputType: "NullableVarChar", outputType: "NullableVarChar", 
runningType: "VarChar", major: "VarBytes", initialValue: ""},
+   {inputType: "VarBinary", outputType: "NullableVarBinary", runningType: 
"VarBinary", major: "VarBytes"},
+   {inputType: "NullableVarBinary", outputType: "NullableVarBinary", 
runningType: "VarBinary", major: "VarBytes"}
+   {inputType: "List", outputType: "List", runningType: "List", major: 
"Complex"}
+   {inputType: "Map", outputType: "Map", runningType: "Map", major: 
"Complex"}
+   {inputType: "RepeatedBit", outputType: "RepeatedNullableBit", 
runningType: "RepeatedBit", major: "Complex"},
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ANY_VALUE aggregate function
> 
>
> Key: DRILL-6375
> URL: https://issues.apache.org/jira/browse/DRILL-6375
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Prior

[jira] [Commented] (DRILL-6375) ANY_VALUE aggregate function

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495660#comment-16495660
 ] 

ASF GitHub Bot commented on DRILL-6375:
---

gparai commented on a change in pull request #1256: DRILL-6375 : Support for 
ANY_VALUE aggregate function
URL: https://github.com/apache/drill/pull/1256#discussion_r191920093
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
 ##
 @@ -228,11 +228,301 @@ public static void writeToMapFromReader(FieldReader 
fieldReader, BaseWriter.MapW
   
fieldReader.copyAsValue(mapWriter.list(MappifyUtility.fieldValue).list());
   break;
 default:
-  throw new DrillRuntimeException(String.format("kvgen does not 
support input of type: %s", valueMinorType));
+  throw new DrillRuntimeException(String.format(caller
+  + " does not support input of type: %s", valueMinorType));
   }
 } catch (ClassCastException e) {
   final MaterializedField field = fieldReader.getField();
-  throw new DrillRuntimeException(String.format(TYPE_MISMATCH_ERROR, 
field.getName(), field.getType()));
+  throw new DrillRuntimeException(String.format(caller + 
TYPE_MISMATCH_ERROR, field.getName(), field.getType()));
+}
+  }
+
+  public static void writeToMapFromReader(FieldReader fieldReader, 
BaseWriter.MapWriter mapWriter,
+  String fieldName, String caller) {
+try {
+  MajorType valueMajorType = fieldReader.getType();
+  MinorType valueMinorType = valueMajorType.getMinorType();
+  boolean repeated = false;
+
+  if (valueMajorType.getMode() == TypeProtos.DataMode.REPEATED) {
+repeated = true;
+  }
+
+  switch (valueMinorType) {
+case TINYINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).tinyInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.tinyInt(fieldName));
+  }
+  break;
+case SMALLINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).smallInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.smallInt(fieldName));
+  }
+  break;
+case BIGINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).bigInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.bigInt(fieldName));
+  }
+  break;
+case INT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).integer());
+  } else {
+fieldReader.copyAsValue(mapWriter.integer(fieldName));
+  }
+  break;
+case UINT1:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt1());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt1(fieldName));
+  }
+  break;
+case UINT2:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt2());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt2(fieldName));
+  }
+  break;
+case UINT4:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt4());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt4(fieldName));
+  }
+  break;
+case UINT8:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt8());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt8(fieldName));
+  }
+  break;
+case DECIMAL9:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).decimal9());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal9(fieldName));
+  }
+  break;
+case DECIMAL18:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).decimal18());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal18(fieldName));
+  }
+  break;
+case DECIMAL28SPARSE:
+  if (repeated) {
+
fieldReader.copyAsValue(mapWriter.list(fieldName).decimal28Sparse());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal28Sparse(fieldName));
+  }
+  break;
+case DECIMAL38SPARSE:
+  if (repeated) {
+
fieldReader.copyAsValue(mapWriter.list(fieldName).decimal38Sparse());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal38Sparse(fieldName));
+  }
+  break;
+case DATE:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).date());
+  } else {
+fieldReader.copyAsValue(mapWriter.dat

[jira] [Commented] (DRILL-6375) ANY_VALUE aggregate function

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495662#comment-16495662
 ] 

ASF GitHub Bot commented on DRILL-6375:
---

gparai commented on a change in pull request #1256: DRILL-6375 : Support for 
ANY_VALUE aggregate function
URL: https://github.com/apache/drill/pull/1256#discussion_r191920232
 
 

 ##
 File path: exec/java-exec/src/main/codegen/data/DecimalAggrTypes1.tdd
 ##
 @@ -35,6 +35,11 @@
{inputType: "VarDecimal", outputType: "NullableVarDecimal"},
{inputType: "NullableVarDecimal", outputType: "NullableVarDecimal"}
   ]
+   },
+   {className: "AnyValue", funcName: "any_value", types: [
+   {inputType: "VarDecimal", outputType: "NullableVarDecimal"},
+   {inputType: "NullableVarDecimal", outputType: "NullableVarDecimal"}
 
 Review comment:
   Modified testcase to use the parquet file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ANY_VALUE aggregate function
> 
>
> Key: DRILL-6375
> URL: https://issues.apache.org/jira/browse/DRILL-6375
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> We had discussions on the Apache Calcite [1] and Apache Drill [2] mailing 
> lists regarding an equivalent for DISTINCT ON. The community seems to prefer 
> the ANY_VALUE. This Jira is a placeholder for implementing the ANY_VALUE 
> aggregate function in Apache Drill. We should also eventually contribute it 
> to Apache Calcite.
> [1]https://lists.apache.org/thread.html/f2007a489d3a5741875bcc8a1edd8d5c3715e5114ac45058c3b3a42d@%3Cdev.calcite.apache.org%3E
> [2]https://lists.apache.org/thread.html/2517eef7410aed4e88b9515f7e4256335215c1ad39a2676a08d21cb9@%3Cdev.drill.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6375) ANY_VALUE aggregate function

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495661#comment-16495661
 ] 

ASF GitHub Bot commented on DRILL-6375:
---

gparai commented on a change in pull request #1256: DRILL-6375 : Support for 
ANY_VALUE aggregate function
URL: https://github.com/apache/drill/pull/1256#discussion_r191920153
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
 ##
 @@ -228,11 +228,301 @@ public static void writeToMapFromReader(FieldReader 
fieldReader, BaseWriter.MapW
   
fieldReader.copyAsValue(mapWriter.list(MappifyUtility.fieldValue).list());
   break;
 default:
-  throw new DrillRuntimeException(String.format("kvgen does not 
support input of type: %s", valueMinorType));
+  throw new DrillRuntimeException(String.format(caller
+  + " does not support input of type: %s", valueMinorType));
   }
 } catch (ClassCastException e) {
   final MaterializedField field = fieldReader.getField();
-  throw new DrillRuntimeException(String.format(TYPE_MISMATCH_ERROR, 
field.getName(), field.getType()));
+  throw new DrillRuntimeException(String.format(caller + 
TYPE_MISMATCH_ERROR, field.getName(), field.getType()));
+}
+  }
+
+  public static void writeToMapFromReader(FieldReader fieldReader, 
BaseWriter.MapWriter mapWriter,
+  String fieldName, String caller) {
+try {
+  MajorType valueMajorType = fieldReader.getType();
+  MinorType valueMinorType = valueMajorType.getMinorType();
+  boolean repeated = false;
+
+  if (valueMajorType.getMode() == TypeProtos.DataMode.REPEATED) {
+repeated = true;
+  }
+
+  switch (valueMinorType) {
+case TINYINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).tinyInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.tinyInt(fieldName));
+  }
+  break;
+case SMALLINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).smallInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.smallInt(fieldName));
+  }
+  break;
+case BIGINT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).bigInt());
+  } else {
+fieldReader.copyAsValue(mapWriter.bigInt(fieldName));
+  }
+  break;
+case INT:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).integer());
+  } else {
+fieldReader.copyAsValue(mapWriter.integer(fieldName));
+  }
+  break;
+case UINT1:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt1());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt1(fieldName));
+  }
+  break;
+case UINT2:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt2());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt2(fieldName));
+  }
+  break;
+case UINT4:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt4());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt4(fieldName));
+  }
+  break;
+case UINT8:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).uInt8());
+  } else {
+fieldReader.copyAsValue(mapWriter.uInt8(fieldName));
+  }
+  break;
+case DECIMAL9:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).decimal9());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal9(fieldName));
+  }
+  break;
+case DECIMAL18:
+  if (repeated) {
+fieldReader.copyAsValue(mapWriter.list(fieldName).decimal18());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal18(fieldName));
+  }
+  break;
+case DECIMAL28SPARSE:
+  if (repeated) {
+
fieldReader.copyAsValue(mapWriter.list(fieldName).decimal28Sparse());
+  } else {
+fieldReader.copyAsValue(mapWriter.decimal28Sparse(fieldName));
+  }
+  break;
+case DECIMAL38SPARSE:
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ANY_VALUE aggregate function
> 
>
>

[jira] [Commented] (DRILL-6375) ANY_VALUE aggregate function

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495663#comment-16495663
 ] 

ASF GitHub Bot commented on DRILL-6375:
---

gparai commented on issue #1256: DRILL-6375 : Support for ANY_VALUE aggregate 
function
URL: https://github.com/apache/drill/pull/1256#issuecomment-393316793
 
 
   @vvysotskyi I have addressed your review comments. Please take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ANY_VALUE aggregate function
> 
>
> Key: DRILL-6375
> URL: https://issues.apache.org/jira/browse/DRILL-6375
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> We had discussions on the Apache Calcite [1] and Apache Drill [2] mailing 
> lists regarding an equivalent for DISTINCT ON. The community seems to prefer 
> the ANY_VALUE. This Jira is a placeholder for implementing the ANY_VALUE 
> aggregate function in Apache Drill. We should also eventually contribute it 
> to Apache Calcite.
> [1]https://lists.apache.org/thread.html/f2007a489d3a5741875bcc8a1edd8d5c3715e5114ac45058c3b3a42d@%3Cdev.calcite.apache.org%3E
> [2]https://lists.apache.org/thread.html/2517eef7410aed4e88b9515f7e4256335215c1ad39a2676a08d21cb9@%3Cdev.drill.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495692#comment-16495692
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191925657
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
+  }
+
+  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats 
operatorStats) throws IOException {
 
 Review comment:
   Removed this constructor


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495693#comment-16495693
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191925766
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
 
 Review comment:
   Removed the other constructor and am creating underlying filesystem directly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495694#comment-16495694
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191926018
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
 
 Review comment:
   This constructor is removed now, so the only thing getting this value is a 
logging statement now. I made the logging statement use getTrimmed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495707#comment-16495707
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191928922
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
+  }
+
+  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats 
operatorStats) throws IOException {
+this.underlyingFs = FileSystem.get(Uri, fsConf);
+logger.trace("Configuration for the DrillFileSystem " + 
fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
+", underlyingFs: " + this.underlyingFs.getUri());
 this.codecFactory = new CompressionCodecFactory(fsConf);
 this.operatorStats = operatorStats;
+setConf(fsConf);
   }
 
   @Override
   public void setConf(Configuration conf) {
 
 Review comment:
   Agreed. Unfortunately setConf() is a public method declared in hadoop's 
FileSystem class. Additionally setConf() is called with a null argument 
everytime DrillFileSystem is initialized because the FileSystem constructor 
calls the Configured(Configuration conf) constructor and passes null for the 
configuration.
   
   What I can do is make the semantics of DrillFileSystem explicit by 
documenting that DrillFileSystem.setConf should never be called and by throwing 
an exception if DrillFileSystem.setConf is ever called after a DrillFileSystem 
object is constructed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495709#comment-16495709
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191929425
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
+  }
+
+  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats 
operatorStats) throws IOException {
+this.underlyingFs = FileSystem.get(Uri, fsConf);
+logger.trace("Configuration for the DrillFileSystem " + 
fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
+", underlyingFs: " + this.underlyingFs.getUri());
 this.codecFactory = new CompressionCodecFactory(fsConf);
 this.operatorStats = operatorStats;
+setConf(fsConf);
 
 Review comment:
   Agreed I will remove this call, and make setConf effectively a noop which 
will throw an exception if anyone calls it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495711#comment-16495711
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191929851
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
+  }
+
+  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats 
operatorStats) throws IOException {
+this.underlyingFs = FileSystem.get(Uri, fsConf);
+logger.trace("Configuration for the DrillFileSystem " + 
fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
+", underlyingFs: " + this.underlyingFs.getUri());
 this.codecFactory = new CompressionCodecFactory(fsConf);
 this.operatorStats = operatorStats;
+setConf(fsConf);
   }
 
   @Override
   public void setConf(Configuration conf) {
 // Guard against setConf(null) call that is called as part of superclass 
constructor (Configured) of the
 // DrillFileSystem, at which point underlyingFs is null.
-if (conf != null && underlyingFs != null) {
-  underlyingFs.setConf(conf);
+if(conf != null) {
+  super.setConf(conf);
+  if (underlyingFs != null && underlyingFs.getConf() == null) {
+underlyingFs.setConf(conf);
+  }
+
 }
   }
 
   @Override
   public Configuration getConf() {
+if (super.getConf() != null) {
 
 Review comment:
   No idea, this is extremely confusing. I will just return the conf on the 
underlying file system. I'll update the other PR if I observe any regressions 
when I do this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6452) document steps to execute SQL queries from Postman (chrome extension) on Drill

2018-05-30 Thread Khurram Faraaz (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495713#comment-16495713
 ] 

Khurram Faraaz commented on DRILL-6452:
---

[~bbevens] thanks for your review and edits. I have accepted your changes. 
Looks good!

> document steps to execute SQL queries from Postman (chrome extension) on Drill
> --
>
> Key: DRILL-6452
> URL: https://issues.apache.org/jira/browse/DRILL-6452
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Minor
>
> We need documentation to list the steps with screen shots about executing SQL 
> queries from Postman (chrome extension) on Drill.
> Here are the steps to execute SQL queries from Postman
> {noformat}
> 1. Install Postman extension for Chrome browser.
>  To install Postman
> https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en
>  Then click on ADD TO CHROME button
> 2. On the top right of your Chrome browser window, click on the postman icon.
> In your Postman:
> 3. set the type to “POST” and enter the request URL as “http:// ip>:8047/query.json”
> 4. In the Header tab, add an entry for “Content-Type” as key and 
> “application/json” as value.
>  Add another entry for “User-Name” as key and “mapr” as value
> 5. In the Body tab, select “raw” and a new dropdown list should appear next 
> to “raw" and on the dropdown select “JSON”
> 6. And in the Body box, enter your request body in JSON format. The file 
> test.csv is expected to reside under /tmp folder (i.e. in dfs.tmp schema)
> {
> “queryType”: “SQL”,
> “query”: “select * from `dfs.tmp`.`test.csv`”
> }
> 5. Press send!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495712#comment-16495712
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on issue #1227: DRILL-6236: batch sizing for hash join
URL: https://github.com/apache/drill/pull/1227#issuecomment-393327326
 
 
   @Ben-Zvi I rebased and updated the PR. Please review the latest diffs.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495729#comment-16495729
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191933248
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##
 @@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
   fsConf.set(s, config.config.get(s));
 }
   }
+  fsConf.set("fs.default.name", config.connection);
 
 Review comment:
   Okay I think I have a better handle on this now. The original issue was that 
Drill's hive storage plugin had a configuration option of fs.default.name = 
file:// . Somehow when a hive table was dropped and then recreated with a ctas 
statement in drill, the CTAS statement picked up the fs.default.name 
configuration from the hive storage plugin and passed that on to 
DrillFileSystem. And apparently if both fs.default.name and fd.defaultFS are 
present with different values the value for fs.default.name wins even though it 
is deprecated. So the CTAS statement would end up creating the table on a drill 
node's local filesystem.
   
   I believe the crux of this PR is to force "fs.default.name" to have to 
correct value in the event that a different value is defined in the HiveStorage 
plugin.
   
   With that said, there are several questions. 
   
1. How the heck does a property in the HiveStoragePlugin make it's way into 
the FileSystem configuration? I spent a good amount of time looking at the code 
and for the life of me I can't figure that out.
2. The follow up to (1) is do we actually want that behavior? We can force 
fs.default.name to have the right value but what about other properties we 
might suck in from a HiveStoragePlugin configuration?
3. If we don't want this behavior what would be the real fix?
   
   In the face of all this ambiguity I think we should move forward with a 
minimal PR that forces fs.default.name to be correct now. We can have a follow 
up Jira that actually fixes the underlying problem of sucking in stray configs 
down the road if someone complains about it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495731#comment-16495731
 ] 

ASF GitHub Bot commented on DRILL-5365:
---

ilooner commented on a change in pull request #796: DRILL-5365: DrillFileSystem 
setConf in constructor. DrillFileSystem c…
URL: https://github.com/apache/drill/pull/796#discussion_r191933248
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 ##
 @@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
   fsConf.set(s, config.config.get(s));
 }
   }
+  fsConf.set("fs.default.name", config.connection);
 
 Review comment:
   Okay I think I have a better handle on this now. The original issue was that 
Drill's hive storage plugin had a configuration option of fs.default.name = 
file:// . Somehow when a hive table was dropped and then recreated with a ctas 
statement in drill, the CTAS statement picked up the fs.default.name 
configuration from the hive storage plugin and passed that on to 
DrillFileSystem. And apparently if both **fs.default.name** and 
**fs.defaultFS** are present with different values the value for 
**fs.default.name** wins even though it is deprecated. So the CTAS statement 
would end up creating the table on a drill node's local filesystem.
   
   I believe the crux of this PR is to force "fs.default.name" to have to 
correct value in the event that a different value is defined in the HiveStorage 
plugin.
   
   With that said, there are several questions. 
   
1. How the heck does a property in the HiveStoragePlugin make it's way into 
the FileSystem configuration? I spent a good amount of time looking at the code 
and for the life of me I can't figure that out.
2. The follow up to (1) is do we actually want that behavior? We can force 
fs.default.name to have the right value but what about other properties we 
might suck in from a HiveStoragePlugin configuration?
3. If we don't want this behavior what would be the real fix?
   
   In the face of all this ambiguity I think we should move forward with a 
minimal PR that forces fs.default.name to be correct now. We can have a follow 
up Jira that actually fixes the underlying problem of sucking in stray configs 
down the road if someone complains about it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6446) Support for EMIT outcome in TopN

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495747#comment-16495747
 ] 

ASF GitHub Bot commented on DRILL-6446:
---

parthchandra commented on a change in pull request #1293: DRILL-6446: Support 
for EMIT outcome in TopN
URL: https://github.com/apache/drill/pull/1293#discussion_r191927146
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java
 ##
 @@ -22,10 +22,12 @@
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.drill.exec.vector.ValueVector;
 
 public abstract class AbstractSV4Copier extends AbstractCopier {
-  protected ValueVector[][] vvIn;
+  // Storing VectorWrapper reference instead of ValueVector[]. With EMIT 
outcome support underlying operator
 
 Review comment:
   Does this change affect RemovingRecordBatch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in TopN
> 
>
> Key: DRILL-6446
> URL: https://issues.apache.org/jira/browse/DRILL-6446
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if TopN is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the TopN operation on the records buffered so far and produce output 
> with it. After EMIT TopN should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6446) Support for EMIT outcome in TopN

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495749#comment-16495749
 ] 

ASF GitHub Bot commented on DRILL-6446:
---

parthchandra commented on a change in pull request #1293: DRILL-6446: Support 
for EMIT outcome in TopN
URL: https://github.com/apache/drill/pull/1293#discussion_r191933080
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ##
 @@ -162,56 +173,67 @@ public void buildSchema() throws SchemaChangeException {
 return;
   case NONE:
 state = BatchState.DONE;
+  case EMIT:
+throw new IllegalStateException("Unexpected EMIT outcome received in 
buildSchema phase");
   default:
-return;
+throw new IllegalStateException("Unexpected outcome received in 
buildSchema phase");
 }
   }
 
   @Override
   public IterOutcome innerNext() {
 recordCount = 0;
 if (state == BatchState.DONE) {
-  return IterOutcome.NONE;
+  return NONE;
 }
-if (schema != null) {
-  if (getSelectionVector4().next()) {
+
+// If both schema and priorityQueue are non-null and priority queue is not 
reset, that means we still have data
+// to be sent downstream for the current record boundary
+if (schema != null && priorityQueue != null && 
priorityQueue.isInitialized()) {
+  if (sv4.next()) {
 recordCount = sv4.getCount();
-return IterOutcome.OK;
+container.setRecordCount(recordCount);
   } else {
 recordCount = 0;
-return IterOutcome.NONE;
+container.setRecordCount(0);
   }
+  return getFinalOutcome();
 }
 
 try{
   outer: while (true) {
 Stopwatch watch = Stopwatch.createStarted();
-IterOutcome upstream;
 if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
+  laskKnownOutcome = IterOutcome.OK_NEW_SCHEMA;
   first = false;
 } else {
-  upstream = next(incoming);
+  laskKnownOutcome = next(incoming);
 }
-if (upstream == IterOutcome.OK && schema == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
+if (laskKnownOutcome == OK && schema == null) {
+  laskKnownOutcome = IterOutcome.OK_NEW_SCHEMA;
   container.clear();
 }
 logger.debug("Took {} us to get next", 
watch.elapsed(TimeUnit.MICROSECONDS));
-switch (upstream) {
+switch (laskKnownOutcome) {
 case NONE:
   break outer;
 case NOT_YET:
   throw new UnsupportedOperationException();
 case OUT_OF_MEMORY:
 case STOP:
-  return upstream;
+  return laskKnownOutcome;
 case OK_NEW_SCHEMA:
   // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
+  // schema change handling in case when EMIT is also seen is same as 
without EMIT. i.e. only if union type
+  // is enabled it will be handled.
+  container.clear();
+  firstBatchForSchema = true;
   if (!incoming.getSchema().equals(schema)) {
 
 Review comment:
   equals() or isEquivalent() ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in TopN
> 
>
> Key: DRILL-6446
> URL: https://issues.apache.org/jira/browse/DRILL-6446
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if TopN is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the TopN operation on the records buffered so far and produce output 
> with it. After EMIT TopN should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6446) Support for EMIT outcome in TopN

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495746#comment-16495746
 ] 

ASF GitHub Bot commented on DRILL-6446:
---

parthchandra commented on a change in pull request #1293: DRILL-6446: Support 
for EMIT outcome in TopN
URL: https://github.com/apache/drill/pull/1293#discussion_r191927924
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ##
 @@ -119,15 +119,38 @@ public boolean next() {
 return true;
   }
 
+  public boolean isEmpty() {
 
 Review comment:
   I would expect any method named isXYZ() to return a boolean without 
modifying the internal state.  What was wrong with leaving it the way it is?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in TopN
> 
>
> Key: DRILL-6446
> URL: https://issues.apache.org/jira/browse/DRILL-6446
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if TopN is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the TopN operation on the records buffered so far and produce output 
> with it. After EMIT TopN should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6446) Support for EMIT outcome in TopN

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495748#comment-16495748
 ] 

ASF GitHub Bot commented on DRILL-6446:
---

parthchandra commented on a change in pull request #1293: DRILL-6446: Support 
for EMIT outcome in TopN
URL: https://github.com/apache/drill/pull/1293#discussion_r191922685
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ##
 @@ -162,56 +173,67 @@ public void buildSchema() throws SchemaChangeException {
 return;
   case NONE:
 state = BatchState.DONE;
+  case EMIT:
+throw new IllegalStateException("Unexpected EMIT outcome received in 
buildSchema phase");
   default:
-return;
+throw new IllegalStateException("Unexpected outcome received in 
buildSchema phase");
 }
   }
 
   @Override
   public IterOutcome innerNext() {
 recordCount = 0;
 if (state == BatchState.DONE) {
-  return IterOutcome.NONE;
+  return NONE;
 }
-if (schema != null) {
-  if (getSelectionVector4().next()) {
+
+// If both schema and priorityQueue are non-null and priority queue is not 
reset, that means we still have data
+// to be sent downstream for the current record boundary
+if (schema != null && priorityQueue != null && 
priorityQueue.isInitialized()) {
+  if (sv4.next()) {
 recordCount = sv4.getCount();
-return IterOutcome.OK;
+container.setRecordCount(recordCount);
   } else {
 recordCount = 0;
-return IterOutcome.NONE;
+container.setRecordCount(0);
   }
+  return getFinalOutcome();
 }
 
 try{
   outer: while (true) {
 Stopwatch watch = Stopwatch.createStarted();
-IterOutcome upstream;
 if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
+  laskKnownOutcome = IterOutcome.OK_NEW_SCHEMA;
 
 Review comment:
   Thought this was a typo in one place until I realised this is how it is 
spelt all over. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in TopN
> 
>
> Key: DRILL-6446
> URL: https://issues.apache.org/jira/browse/DRILL-6446
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if TopN is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the TopN operation on the records buffered so far and produce output 
> with it. After EMIT TopN should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6446) Support for EMIT outcome in TopN

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495750#comment-16495750
 ] 

ASF GitHub Bot commented on DRILL-6446:
---

parthchandra commented on a change in pull request #1293: DRILL-6446: Support 
for EMIT outcome in TopN
URL: https://github.com/apache/drill/pull/1293#discussion_r191936372
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/AbstractSV4Copier.java
 ##
 @@ -22,10 +22,12 @@
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.drill.exec.vector.ValueVector;
 
 public abstract class AbstractSV4Copier extends AbstractCopier {
-  protected ValueVector[][] vvIn;
+  // Storing VectorWrapper reference instead of ValueVector[]. With EMIT 
outcome support underlying operator
+  // operator can generate multiple output batches with no schema changes 
which will change the ValueVector[]
+  // reference but not VectorWrapper reference.
+  protected VectorWrapper[] vvIn;
 
 Review comment:
   Does this require a change to RemovingRecordBatch()?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for EMIT outcome in TopN
> 
>
> Key: DRILL-6446
> URL: https://issues.apache.org/jira/browse/DRILL-6446
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if TopN is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the TopN operation on the records buffered so far and produce output 
> with it. After EMIT TopN should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6356) batch sizing for union all

2018-05-30 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6356:
-
Labels: ready-to-commit  (was: )

> batch sizing for union all
> --
>
> Key: DRILL-6356
> URL: https://issues.apache.org/jira/browse/DRILL-6356
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> batch sizing changes for union all operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495864#comment-16495864
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

Ben-Zvi commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191950133
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##
 @@ -448,8 +442,7 @@ private void calculateMemoryUsage()
 safetyFactor,
 reserveHash);
 
-  maxOutputBatchSize = computeMaxOutputBatchSize(buildValueSizes, 
probeValueSizes, keySizes,
-outputBatchNumRecords, safetyFactor, fragmentationFactor);
+  maxOutputBatchSize = (long) (outputBatchSize * fragmentationFactor * 
safetyFactor);
 
 Review comment:
   Maybe the "outputBatchSize" needs to be casted to (double) to ensure that 
the whole multiplication is performed as a double-multiplication. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495881#comment-16495881
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191953253
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##
 @@ -448,8 +442,7 @@ private void calculateMemoryUsage()
 safetyFactor,
 reserveHash);
 
-  maxOutputBatchSize = computeMaxOutputBatchSize(buildValueSizes, 
probeValueSizes, keySizes,
-outputBatchNumRecords, safetyFactor, fragmentationFactor);
+  maxOutputBatchSize = (long) (outputBatchSize * fragmentationFactor * 
safetyFactor);
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495973#comment-16495973
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

Ben-Zvi commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191964302
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/JoinBatchMemoryManager.java
 ##
 @@ -85,13 +71,50 @@ public int update(int inputIndex, int outputPosition) {
   }
 
   @Override
-  public RecordBatchSizer.ColumnSize getColumnSize(String name) {
-RecordBatchSizer leftSizer = getRecordBatchSizer(LEFT_INDEX);
-RecordBatchSizer rightSizer = getRecordBatchSizer(RIGHT_INDEX);
+  public int update(int inputIndex, int outputPosition, boolean useAggregate) {
+switch (inputIndex) {
+  case LEFT_INDEX:
 
 Review comment:
   A cleanup suggestion: There are too many "update()" methods. And the LEFT 
never use aggregate, and the RIGHT always use aggregate. So how about instead:
   ```
   private int foo(RecordBatch batch, int inputIndex, boolean useAggregate) {
setRecordBatchSizer(inputIndex, new RecordBatchSizer(batch));
updateIncomingStats(inputIndex);
return useAggregate ? (int) getAvgInputRowWidth(inputIndex) : 
getRecordBatchSizer(inputIndex).getRowAllocSize();
   }
   
   public int updateRight(RecordBatch batch,int outputPosition) {
  rightRowWidth = foo(batch,RIGHT_INDEX,true);
  return updateInternal(outputPosition);
   }
   
   public int updateLeft(RecordBatch batch,int outputPosition) {
  leftRowWidth = foo(batch,LEFT_INDEX,false);
  return updateInternal(outputPosition);
   }
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495972#comment-16495972
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

Ben-Zvi commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191955603
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
 ##
 @@ -262,6 +272,7 @@ private void executeProbePhase() throws 
SchemaChangeException {
 probeBatch.getSchema());
 }
   case OK:
+
setTargetOutputCount(outgoingJoinBatch.getBatchMemoryManager().update(probeBatch,
 LEFT_INDEX,outputRecords));
 
 Review comment:
   This code is called when a new LEFT incoming batch is read. At this point 
the outgoing batch may be "half full". Looks like this call is modifying the 
"targetOutputRecords" variable. If so, then it would not match the allocated 
size for the outgoing batch. For example, if made bigger, then the code above 
would try to add rows (to the outgoing) beyond the original allocation size !
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495971#comment-16495971
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

Ben-Zvi commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191961382
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/JoinBatchMemoryManager.java
 ##
 @@ -85,13 +71,50 @@ public int update(int inputIndex, int outputPosition) {
   }
 
   @Override
-  public RecordBatchSizer.ColumnSize getColumnSize(String name) {
 
 Review comment:
   Why is the overriding method deleted ?  It is used by Lateral-Join and 
Merge-Join. By deleting it they are going to use the one from the super class 
(RecordBatchMemoryManager) .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495994#comment-16495994
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191974514
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
 ##
 @@ -262,6 +272,7 @@ private void executeProbePhase() throws 
SchemaChangeException {
 probeBatch.getSchema());
 }
   case OK:
+
setTargetOutputCount(outgoingJoinBatch.getBatchMemoryManager().update(probeBatch,
 LEFT_INDEX,outputRecords));
 
 Review comment:
   It will not make it bigger. It will look at remaining memory and adjust the 
row count based on that. 
   
   final long remainingMemory = Math.max(configOutputBatchSize - 
memoryUsed, 0);
   // These are number of rows we can fit in remaining memory based on new 
outgoing row width.
   final int numOutputRowsRemaining = 
RecordBatchSizer.safeDivide(remainingMemory, newOutgoingRowWidth);


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495995#comment-16495995
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191974514
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
 ##
 @@ -262,6 +272,7 @@ private void executeProbePhase() throws 
SchemaChangeException {
 probeBatch.getSchema());
 }
   case OK:
+
setTargetOutputCount(outgoingJoinBatch.getBatchMemoryManager().update(probeBatch,
 LEFT_INDEX,outputRecords));
 
 Review comment:
   It will not make it bigger. It will look at remaining memory and adjust the 
row count based on that. 
   
   Here is the relevant code from updateInternal function:
   final long remainingMemory = Math.max(configOutputBatchSize - 
memoryUsed, 0);
   // These are number of rows we can fit in remaining memory based on new 
outgoing row width.
   final int numOutputRowsRemaining = 
RecordBatchSizer.safeDivide(remainingMemory, newOutgoingRowWidth);


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495996#comment-16495996
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191974732
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/JoinBatchMemoryManager.java
 ##
 @@ -85,13 +71,50 @@ public int update(int inputIndex, int outputPosition) {
   }
 
   @Override
-  public RecordBatchSizer.ColumnSize getColumnSize(String name) {
 
 Review comment:
   Because it is redundant. super class is doing the same thing. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6236) batch sizing for hash join

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495997#comment-16495997
 ] 

ASF GitHub Bot commented on DRILL-6236:
---

ppadma commented on a change in pull request #1227: DRILL-6236: batch sizing 
for hash join
URL: https://github.com/apache/drill/pull/1227#discussion_r191975252
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/JoinBatchMemoryManager.java
 ##
 @@ -85,13 +71,50 @@ public int update(int inputIndex, int outputPosition) {
   }
 
   @Override
-  public RecordBatchSizer.ColumnSize getColumnSize(String name) {
-RecordBatchSizer leftSizer = getRecordBatchSizer(LEFT_INDEX);
-RecordBatchSizer rightSizer = getRecordBatchSizer(RIGHT_INDEX);
+  public int update(int inputIndex, int outputPosition, boolean useAggregate) {
+switch (inputIndex) {
+  case LEFT_INDEX:
 
 Review comment:
   I rearranged the code. Got rid of left and right. Instead, using array 
called rowWidth which can be indexed by input index.  It is better now. 
Unfortunately, each operator calls update with different parameters. So, we 
have different versions of the same function.
   
   Right is not always "use aggregate". 
   It is based on the operator. For example, for merge join, we do not use 
aggregate. It is batch by batch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> batch sizing for hash join
> --
>
> Key: DRILL-6236
> URL: https://issues.apache.org/jira/browse/DRILL-6236
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit output batch size for hash join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6456) Planner shouldn't create any exchanges on the right side of Lateral Join.

2018-05-30 Thread Hanumath Rao Maduri (JIRA)

Hanumath Rao Maduri created DRILL-6456:
--

 Summary: Planner shouldn't create any exchanges on the right side 
of Lateral Join.
 Key: DRILL-6456
 URL: https://issues.apache.org/jira/browse/DRILL-6456
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.14.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


Currently, there is no restriction placed on right side of the LateralJoin. 
This is causing planner to generate an Exchange when there are operators like 
(Agg, Limit, Sort etc). 

Due to this unnest operator cannot retrieve the row from lateral's left side to 
process the pipeline further. Enhance the planner to not generate exchanges on 
the right side of the LateralJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6455) JDBC Scan Operator does not appear in profile

2018-05-30 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496017#comment-16496017
 ] 

ASF GitHub Bot commented on DRILL-6455:
---

amansinha100 commented on a change in pull request #1297: DRILL-6455: Add 
missing JDBC Scan Operator for profiles
URL: https://github.com/apache/drill/pull/1297#discussion_r191978693
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -24327,11 +24336,11 @@ public Builder clearStatus() {
   "$\022\021\n\rPCAP_SUB_SCAN\020%\022\022\n\016KAFKA_SUB_SCAN\020&" +
   
"\022\021\n\rKUDU_SUB_SCAN\020\'\022\013\n\007FLATTEN\020(\022\020\n\014LATE" +
   "RAL_JOIN\020)\022\n\n\006UNNEST\020*\022,\n(HIVE_DRILL_NAT" +
-  "IVE_PARQUET_ROW_GROUP_SCAN\020+*g\n\nSaslStat" +
-  
"us\022\020\n\014SASL_UNKNOWN\020\000\022\016\n\nSASL_START\020\001\022\024\n\020"
 +
-  
"SASL_IN_PROGRESS\020\002\022\020\n\014SASL_SUCCESS\020\003\022\017\n\013" +
-  "SASL_FAILED\020\004B.\n\033org.apache.drill.exec.p" +
-  "rotoB\rUserBitSharedH\001"
+  "IVE_PARQUET_ROW_GROUP_SCAN\020+\022\r\n\tJDBC_SCA" +
 
 Review comment:
   These changes look like generated output which should not be checked in..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Scan Operator does not appear in profile
> -
>
> Key: DRILL-6455
> URL: https://issues.apache.org/jira/browse/DRILL-6455
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.13.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.14.0
>
>
> It seems that the Operator is not defined, though it appears in the text plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

65 matches

Mail list logo