Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement

2018-08-09 Thread Parth Chandra
In addition to directory and row group pruning, the physical plan
generation looks at data locality for every row group and schedules the
scan for the row group on the node where data is local. (Remote reads can
kill performance like nothing else can).

Short answer, query planning requires metadata; schema, statistics, and
locality being the most important pieces of metadata we need. Drill will
try to infer schema where it is not available but if the user can provide
schema, Drill will do a better job. The same with statistics, Drill can
plan a query without stats but if the stats are available, the execution
plan will be much more efficient.

So the job boils down to gathering metadata efficiently. As the metadata
size increases, the task becomes similar to that of executing a query on a
large data set. The metadata cache is a starting point for collecting
metadata. Essentially, it means we have pre-fetched metadata. However, with
a large amount of metadata, say several GB, this approach becomes the
bottleneck and the solution is really to distribute the metadata collection
as well. That's where the metastore comes in.

The next level of complexity will then shift to the planning process. A
single thread processing a few GB of metadata could itself become the
bottleneck. So the next step in Drill's evolution will be to partition and
distribute the planning process itself. That will be fun :)


On Thu, Aug 9, 2018 at 10:51 AM, Paul Rogers 
wrote:

> Hi Alex,
>
> Perhaps Parth can jump in here as he has deeper knowledge of Parquet.
>
> My understanding is that the planning-time metadata is used for partition
> (directory) and row group pruning. By scanning the footer of each Parquet
> row group, Drill can determine whether that group contains data that the
> query must process. Only groups that could contain target data are included
> in the list of groups to be scanned by the query's scan operators.
>
> For example, suppose we do a query over a date range and a US state:
>
> SELECT saleDate, prodId, quantity FROM `sales`WHERE dir0 BETWEEN
> '2018-07-01` AND '2018-07-31'AND state = 'CA'
>
> Suppose sales is a directory that contains subdirectories partitioned by
> date, and that state is a dictionary-encoded field with metadata in the
> Parquet footer.
>
> In this case, Drill reads the directory structure to do the partition
> pruning at planning time. Only the matching directories are added to the
> list of files to be scanned at run time.
>
> Then, Drill scans the footers of each Parquet row group to find those with
> `state` fields that contain the value of 'CA'. (I believe the footer
> scanning is done only for matching partitions, but I've not looked at that
> code recently.)
>
> Finally, Drill passes the matching set of files to the distribution
> planner, which assigns files to minor fragments distributed across the
> Drill cluster. The result attempts to maximize locality and minimize skew.
>
> Could this have been done differently? There is a trade-off. If the work
> is done in the planner, then it can be very expensive for large data sets.
> But, on the other hand, the resulting distribution plan has minimal skew:
> each file that is scheduled for scanning does, in fact, need to be scanned.
>
> Suppose that Drill pushed the partition and row group filtering to the
> scan operator. In this case, operators that received files outside of the
> target date range would do nothing, while those with files in the range
> would need to do further work, resulting in possible skew. (One could argue
> that, if files are randomly distributed across nodes, then, on average,
> each scan operator will see roughly the same number of included and
> excluded files, reducing skew. This would be an interesting experiment to
> try.)
>
> Another approach might be to do a two-step execution. First distribute the
> work of doing the partition pruning and row group footer checking. Gather
> the results. Then, finish planning the query and distribute the execution
> as is done today. Here, much new code would be needed to implement this new
> form of task, which is probably why it has not yet been done.
>
> Might be interesting for someone to prototype that to see if we get
> improvement in planning performance. My guess is that, a prototype could be
> built that does the "pre-selection" work as a Drill query, just to see if
> distribution of the work helps. If that is a win, then perhaps a more
> robust solution could be developed from there.
>
> The final factor is Drill's crude-but-effective metadata caching
> mechanism. Drill reads footer and directory information for all Parquet
> files in a directory structure and stores it in a series of possibly-large
> JSON files, one in each directory. This mechanism has no good concurrency
> control. (This is a long-standing, hard-to-fix bug.) Centralizing access in
> the Foreman reduces the potential for concurrent writes on metadata refresh
> (but does not eliminate them 

[GitHub] paul-rogers commented on issue #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-09 Thread GitBox
paul-rogers commented on issue #1429: DRILL-6676: Add Union, List and Repeated 
List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#issuecomment-411933865
 
 
   @ilooner, this work is related to your resource management efforts. Please 
review at your convenience. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-09 Thread GitBox
paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and 
Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429
 
 
   Previous commits provided the core "result set loader" (RSL) structure and 
support for the "mainstream" vector types, including structured types such as 
maps and lists.
   
   This PR adds the "obscure" (and partly implemented) types used for JSON: 
(non-repeated) list, repeated list and union.
   
   The union type is complex: it is a bundle of vectors keyed by type, and can 
accept new types as a run proceeds. A (non-repeated) list is highly complex: it 
it can act like a repeated list, but with the ability to specify a null state 
for each entry. The non-repeated List can also act like a union type. This 
dual/morphing nature of a list required some rather complex magic behind the 
scenes to support the simple JSON-like interface used by the row set and result 
set loader mechanisms.
   
   This PR introduces the idea of a "variant" to model unions and 
non-repeated-lists-as-list-of-unions. The name is taken from Microsoft Basic 
and simply means a tagged union. (Where "union" is taken from "C".)
   
   Changes include fixing a number of issues with the list vectors, adding 
support in the column accessors and metadata layers, and adding support for 
creating vectors from metadata and metadata from vectors.
   
   Unit tests demonstrate how to use the resulting behavior as well as 
verifying that the behavior is correct.
   
   The focus of this PR is to enable union, list and repeated list support in 
the RSL and associated mechanisms. It is known that support of these vector 
types is incomplete: some operators fail when presented with such vectors. It 
is not the goal here to fix those issues: this is not a PR to fully support 
these types. Rather, the the scope of this PR is just to the RSL and associated 
classes.
   
   For more information, see [this wiki 
entry](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
   
   This PR completes the result set loader work. The next PR in this series 
will introduce revisions to the scan operator that allow readers to use the 
RSL. After that, there are revised implementations for the delimited text (e.g. 
CSV) and JSON readers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin

2018-08-09 Thread GitBox
Ben-Zvi commented on issue #1408: DRILL-6453: Resolve deadlock when reading 
from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#issuecomment-411933587
 
 
   OK - tag it ready-to-commit ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HanumathRao commented on issue #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
HanumathRao commented on issue #1426: DRILL-6671: Multi level lateral unnest 
join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#issuecomment-411931640
 
 
   @vvysotskyi I have made the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6678) Improve SelectionVectorRemover to pack output batch based on BatchSizing

2018-08-09 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6678:


 Summary: Improve SelectionVectorRemover to pack output batch based 
on BatchSizing
 Key: DRILL-6678
 URL: https://issues.apache.org/jira/browse/DRILL-6678
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


SelectionVectorRemover in most of the cases is downstream to Filter which 
reduces the number of records to be copied in output container. In those cases 
if SelectionVectorRemover can pack the output batch to maximum utilization that 
will reduce the number of output batches from it and will help to improve 
performance. During Lateral & Unnest  Performance evaluation we have noticed a 
significant decrease in performance as number of batches increases for same 
number of rows (i.e. Batch is not fully packed)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ilooner commented on issue #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on issue #1415: DRILL-6656: Disallow extra semicolons in 
import statements.
URL: https://github.com/apache/drill/pull/1415#issuecomment-411910893
 
 
   @vvysotskyi Made additional style changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209091287
 
 

 ##
 File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java
 ##
 @@ -83,7 +83,7 @@ public static void tearDownStatement() throws SQLException {
   public void testDefaultGetQueryTimeout() throws SQLException {
 try(Statement stmt = connection.createStatement()) {
   int timeoutValue = stmt.getQueryTimeout();
-  assertEquals( 0 , timeoutValue );
+  assertEquals( 0, timeoutValue );
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209091229
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java
 ##
 @@ -375,7 +376,8 @@ public static void writeSimpleValues(SimpleGroupFactory 
sgf, ParquetWriter

[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209091196
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java
 ##
 @@ -292,7 +292,8 @@ public static void writeComplexValues(GroupFactory gf, 
ParquetWriter comp
   .append("_INT_64", 0x7FFFL)
   .append("_UINT_64", 0xL)
   .append("_DECIMAL_decimal18", 0xL);
-  byte[] bytes = new byte[30]; Arrays.fill(bytes, (byte)1);
+  byte[] bytes = new byte[30];
+  Arrays.fill(bytes, (byte)1);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209091066
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/protocol/TestOperatorRecordBatch.java
 ##
 @@ -99,7 +99,10 @@ public BatchAccessor batchAccessor() {
 }
 
 @Override
-public boolean buildSchema() { buildSchemaCalled = true; return ! 
schemaEOF; }
+public boolean buildSchema() {
+  buildSchemaCalled = true;
+  return ! schemaEOF;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090720
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws 
Exception{
 public void twoBitOneExchangeTwoEntryRun() throws Exception{
   RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = 
new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator());){
+  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090647
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws 
Exception{
 public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
-bit1.run();
-client.connect();
-List results = client.runQuery(QueryType.LOGICAL, 
Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), 
Charsets.UTF_8));
-int count = 0;
-for(QueryDataBatch b : results){
-count += b.getHeader().getRowCount();
-b.release();
-}
-assertEquals(100, count);
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090761
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws 
Exception{
 public void twoBitOneExchangeTwoEntryRun() throws Exception{
   RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = 
new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator());){
+  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+  Drillbit bit2 = new Drillbit(CONFIG, serviceSet);
+  DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090671
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws 
Exception{
 public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
-bit1.run();
-client.connect();
-List results = client.runQuery(QueryType.LOGICAL, 
Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), 
Charsets.UTF_8));
-int count = 0;
-for(QueryDataBatch b : results){
-count += b.getHeader().getRowCount();
-b.release();
-}
-assertEquals(100, count);
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090584
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{
   public void oneBitOneExchangeTwoEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090505
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{
   public void oneBitOneExchangeTwoEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090387
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -43,7 +43,8 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209090432
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -43,7 +43,8 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209089770
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
 ##
 @@ -227,7 +227,7 @@ private ValueVector evalExprWithInterpreter(String 
expression, RecordBatch batch
   }
 
   private void showValueVectorContent(ValueVector vw) {
-for (int row = 0; row < vw.getAccessor().getValueCount(); row ++ ) {
+for (int row = 0; row < vw.getAccessor().getValueCount(); row++ ) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209089299
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java
 ##
 @@ -84,7 +84,9 @@ protected void explain_(
 s.append("  ");
 
 if (id != null && id.opId == 0) {
-  for(int i =0; i < spacer.get(); i++){ s.append('-');}
+  for(int i = 0; i < spacer.get(); i++) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209089217
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -302,9 +302,12 @@ private LogicalExpression 
getDrillCastFunctionFromOptiq(RexCall call){
 castType = 
Types.required(MinorType.VARCHAR).toBuilder().setPrecision(call.getType().getPrecision()).build();
 break;
 
-  case "INTEGER": castType = Types.required(MinorType.INT); break;
-  case "FLOAT": castType = Types.required(MinorType.FLOAT4); break;
-  case "DOUBLE": castType = Types.required(MinorType.FLOAT8); break;
+  case "INTEGER": castType = Types.required(MinorType.INT);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209088852
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ##
 @@ -235,7 +235,7 @@ public static boolean isScalarSubquery(RelNode root) {
   if (currentrel instanceof DrillAggregateRel) {
 agg = (DrillAggregateRel)currentrel;
   } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset) currentrel).getBest() ;
+currentrel = ((RelSubset)currentrel).getBest();
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209088799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java
 ##
 @@ -88,6 +88,6 @@ public void setMaxAllocation(long maxAllocation) {
   public boolean isBufferedOperator(QueryContext queryContext) {
 // In case forced to use a single partition - do not consider this a 
buffered op (when memory is divided)
 return queryContext == null ||
-  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR)
 ;
+  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209085677
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
 ##
 @@ -104,6 +104,6 @@ public void setMaxAllocation(long maxAllocation) {
   public boolean isBufferedOperator(QueryContext queryContext) {
 // In case forced to use a single partition - do not consider this a 
buffered op (when memory is divided)
 return queryContext == null ||
-  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR)
 ;
+  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209085506
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ##
 @@ -713,7 +713,7 @@ public void eval() {
 if (length.value > 0) {
   charLen = Math.min((int)length.value, charCount);  //left('abc', 5) 
-> 'abc'
 } else if (length.value < 0) {
-  charLen = Math.max(0, charCount + (int)length.value) ; // 
left('abc', -5) ==> ''
+  charLen = Math.max(0, charCount + (int)length.value); // left('abc', 
-5) ==> ''
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options webUI

2018-08-09 Thread GitBox
kkhatua commented on issue #1279: DRILL-5735: Allow search/sort in the Options 
webUI
URL: https://github.com/apache/drill/pull/1279#issuecomment-411901380
 
 
   @arina-ielchiieva Rebased this PR with additional changes on the latest 
master. Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1415: DRILL-6656: Disallow extra 
semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r209081893
 
 

 ##
 File path: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java
 ##
 @@ -125,7 +125,7 @@ protected void doPushFilterToScan(final RelOptRuleCall 
call, final FilterPrel fi
 final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType());
 
 // Depending on whether is a project in the middle, assign either scan or 
copy of project to childRel.
-final RelNode childRel = project == null ? newScanPrel : 
project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));;
+final RelNode childRel = project == null ? newScanPrel : 
project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));
 
 Review comment:
   Removed the cast. IntelliJ said it was unnecessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (DRILL-6674) Minor fixes to avoid auto boxing cost in logging in LateralJoinBatch

2018-08-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia resolved DRILL-6674.
--
Resolution: Fixed

> Minor fixes to avoid auto boxing cost in logging in LateralJoinBatch
> 
>
> Key: DRILL-6674
> URL: https://issues.apache.org/jira/browse/DRILL-6674
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] sohami closed pull request #1427: DRILL-6674: Minor fixes to avoid auto boxing cost in logging in Later…

2018-08-09 Thread GitBox
sohami closed pull request #1427: DRILL-6674: Minor fixes to avoid auto boxing 
cost in logging in Later…
URL: https://github.com/apache/drill/pull/1427
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
index 18843b5b4b0..ff33e2f3c05 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
@@ -276,8 +276,9 @@ public int getRecordCount() {
*/
   @Override
   public RecordBatch getIncoming() {
-Preconditions.checkState (left != null, "Retuning null left batch. It's 
unexpected since right side will only be " +
-  "called iff there is any valid left batch");
+Preconditions.checkState (left != null,
+  "Retuning null left batch. It's unexpected since right side will only be 
called iff " +
+"there is any valid left batch");
 return left;
   }
 
@@ -348,7 +349,8 @@ protected void buildSchema() throws SchemaChangeException {
 if (!prefetchFirstBatchFromBothSides()) {
   return;
 }
-Preconditions.checkState(right.getRecordCount() == 0, "Unexpected 
non-empty first right batch received");
+Preconditions.checkState(right.getRecordCount() == 0,
+  "Unexpected non-empty first right batch received");
 
 // Setup output container schema based on known left and right schema
 setupNewSchema();
@@ -728,7 +730,6 @@ private void finalizeOutputContainer() {
 
 // Set the record count in the container
 container.setRecordCount(outputIndex);
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
 
 batchMemoryManager.updateOutgoingStats(outputIndex);
 
@@ -976,12 +977,19 @@ private void crossJoinAndOutputRecords() {
   int leftRowId = leftJoinIndex + 1;
   int numRowsCopied = 0;
 
-  Preconditions.checkState(currentRowId <= leftRecordCount || 
leftJoinIndex <= leftRecordCount,
-"Either RowId in right batch is greater than total records in left 
batch or all rows in left batch " +
-  "is processed but there are still rows in right batch. 
Details[RightRowId: %s, LeftRecordCount: %s, " +
-  "LeftJoinIndex: %s, RightJoinIndex: %s]", currentRowId, 
leftRecordCount, leftJoinIndex, rightJoinIndex);
+  if (currentRowId > leftRecordCount || leftJoinIndex > leftRecordCount) {
+// Not using Preconditions.checkState here since along with condition 
evaluation there will be cost of boxing
+// the arguments.
+throw new IllegalStateException(String.format("Either RowId in right 
batch is greater than total records in " +
+  "left batch or all rows in left batch is processed but there are 
still rows in right batch. " +
+  "Details[RightRowId: %s, LeftRecordCount: %s, LeftJoinIndex: %s, 
RightJoinIndex: %s]",
+  currentRowId, leftRecordCount, leftJoinIndex, rightJoinIndex));
+  }
 
-  logger.trace("leftRowId and currentRowId are: {}, {}", leftRowId, 
currentRowId);
+  if (logger.isTraceEnabled()) {
+// Inside the if condition to eliminate parameter boxing cost
+logger.trace("leftRowId and currentRowId are: {}, {}", leftRowId, 
currentRowId);
+  }
 
   // If leftRowId matches the rowId in right row then emit left and right 
row. Increment outputIndex, rightJoinIndex
   // and numRowsCopied. Also set leftMatchFound to true to indicate when 
to increase leftJoinIndex.
@@ -1089,10 +1097,14 @@ private void setupInputOutputVectors(RecordBatch batch, 
int startVectorIndex, in
   private void copyDataToOutputVectors(int fromRowIndex, int toRowIndex, 
Map mapping,
int numRowsToCopy, boolean 
isRightBatch) {
 for (Map.Entry entry : mapping.entrySet()) {
-  logger.trace("Copying data from incoming batch vector to outgoing batch 
vector. [IncomingBatch: " +
+
+  if (logger.isTraceEnabled()) {
+// Inside the if condition to eliminate parameter boxing cost
+logger.trace("Copying data from incoming batch vector to outgoing 
batch vector. [IncomingBatch: " +
   "(RowIndex: {}, ColumnName: {}), OutputBatch: (RowIndex: {}, 
ColumnName: {}) and Other: (TimeEachValue: {})]",
-fromRowIndex, entry.getKey().getField().getName(), toRowIndex, 
entry.getValue().getField().getName(),
-numRowsToCopy);
+  fromRowIndex, entry.getKey().getField().getName(), toRowIndex, 
entry.getValue().getField().getName(),
+  numRowsToCopy);
+  }
 
   // Copy data from input vector to output 

[GitHub] ilooner commented on issue #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin

2018-08-09 Thread GitBox
ilooner commented on issue #1408: DRILL-6453: Resolve deadlock when reading 
from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#issuecomment-411890416
 
 
   @Ben-Zvi Thanks for the review and suggestions. The code is much cleaner now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r209072214
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##
 @@ -365,17 +299,20 @@ protected void initialize(boolean autoTune,
   double loadFactor) {
   Preconditions.checkState(!firstInitialized);
   Preconditions.checkArgument(initialPartitions >= 1);
+  // If we had probe data before there should still be probe data now.
+  // If we didn't have probe data before we could get some new data now.
+  Preconditions.checkState(probeSizePredictor.hasData() && !probeEmpty || 
!probeSizePredictor.hasData());
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r209066972
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/BatchSizePredictor.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+import org.apache.drill.exec.record.RecordBatch;
+
+/**
+ * This class predicts the sizes of batches given an input batch.
+ *
+ * Invariants
+ * 
+ *   The {@link BatchSizePredictor} assumes that a {@link RecordBatch} is 
in a state where it can return a valid record count.
+ * 
+ */
+public interface BatchSizePredictor {
+  /**
+   * Gets the batchSize computed in the call to {@link #updateStats()}. 
Returns 0 if {@link #hasData()} is false.
+   * @return Gets the batchSize computed in the call to {@link 
#updateStats()}. Returns 0 if {@link #hasData()} is false.
+   * @throws IllegalStateException if {@link #updateStats()} was never called.
+   */
+  long getBatchSize();
+
+  /**
+   * Gets the number of records computed in the call to {@link 
#updateStats()}. Returns 0 if {@link #hasData()} is false.
+   * @return Gets the number of records computed in the call to {@link 
#updateStats()}. Returns 0 if {@link #hasData()} is false.
+   * @throws IllegalStateException if {@link #updateStats()} was never called.
+   */
+  int getNumRecords();
+
+  /**
+   * True if the input batch had records in the last call to {@link 
#updateStats()}. False otherwise.
+   * @return True if the input batch had records in the last call to {@link 
#updateStats()}. False otherwise.
+   */
+  boolean hasData();
 
 Review comment:
   Renamed to hadDataLastTime


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on a change in pull request #1408: DRILL-6453: Resolve deadlock when reading from build and probe sides simultaneously in HashJoin

2018-08-09 Thread GitBox
ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r209066765
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##
 @@ -420,31 +357,32 @@ public long getMaxReservedMemory() {
 private void calculateMemoryUsage()
 {
   // Adjust based on number of records
-  maxBuildBatchSize = computeMaxBatchSizeNoHash(buildBatchSize, 
buildNumRecords,
-maxBatchNumRecordsBuild, fragmentationFactor, safetyFactor);
-  maxProbeBatchSize = computeMaxBatchSizeNoHash(probeBatchSize, 
probeNumRecords,
-maxBatchNumRecordsProbe, fragmentationFactor, safetyFactor);
-
-  // Safety factor can be multiplied at the end since these batches are 
coming from exchange operators, so no excess value vector doubling
-  partitionBuildBatchSize = computeMaxBatchSize(buildBatchSize,
-buildNumRecords,
-recordsPerPartitionBatchBuild,
-fragmentationFactor,
-safetyFactor,
-reserveHash);
+  maxBuildBatchSize = 
buildSizePredictor.predictBatchSize(maxBatchNumRecordsBuild, false);
 
-  // Safety factor can be multiplied at the end since these batches are 
coming from exchange operators, so no excess value vector doubling
-  partitionProbeBatchSize = computeMaxBatchSize(
-probeBatchSize,
-probeNumRecords,
-recordsPerPartitionBatchProbe,
-fragmentationFactor,
-safetyFactor,
-reserveHash);
+  if (probeSizePredictor.hasData()) {
+// We have probe data and we can compute the max incoming size.
+maxProbeBatchSize = 
probeSizePredictor.predictBatchSize(maxBatchNumRecordsProbe, false);
+  } else {
+// We don't have probe data
+if (probeEmpty) {
+  // We know the probe has no data, so we don't need to reserve any 
space for the incoming probe
+  maxProbeBatchSize = 0;
+} else {
+  // The probe side may have data, so assume it is the max incoming 
batch size. This assumption
+  // can fail in some cases since the batch sizing project is 
incomplete.
+  maxProbeBatchSize = maxIncomingBatchSize;
+}
+  }
+
+  partitionBuildBatchSize = 
buildSizePredictor.predictBatchSize(recordsPerPartitionBatchBuild, reserveHash);
+
+  if (probeSizePredictor.hasData()) {
+partitionProbeBatchSize = 
probeSizePredictor.predictBatchSize(recordsPerPartitionBatchProbe, reserveHash);
+  }
 
   maxOutputBatchSize = (long) ((double)outputBatchSize * 
fragmentationFactor * safetyFactor);
 
-  long probeReservedMemory;
+  long probeReservedMemory = -1;
 
 Review comment:
   Removed -1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [ANNOUNCE] Apache Drill Release 1.14.0

2018-08-09 Thread Boaz Ben-Zvi
   Thanks Vlad for the advice and the link. Looking back at prior releases,
the 1.9.0 announcement (by Sudheesh) was also sent to the
annou...@apache.org, but then he got feedback like "What's this project
about?" and "Why should I care?" , so the following announcements avoided
the Apache general list.
   The link does suggest " put 3-5 lines blurb for the project. " , which
could address such feedbacks.
 We should try and compose a good "blurb" first.

  Boaz


On Thu, Aug 9, 2018 at 11:14 AM, Vlad Rozov  wrote:

> I'd recommend announcing the release on the ASF-wide mailing list (
> annou...@apache.org) as well as there may be other community members who
> may get interested in the new functionality. Please see [1] for the
> requirements.
>
> Thank you,
>
> Vlad
>
> [1] https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> apache.org_legal_release-2Dpolicy.html-23release-
> 2Dannouncements=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_
> f3xin2DwJkVwQLS8RXwH1M=qgOPe8LyLXFsI1T3WIIXnvmSWJASg6LeVldtgQKjuBw=
>
> On 2018/08/06 06:15:55, Charles Givre  wrote:
> > Thanks Boaz and great work everyone!
> >
> > Sent from my iPhone
> >
> > > On Aug 5, 2018, at 21:52, Abhishek Girish  wrote:
> > >
> > > Congratulations, everyone! And Boaz, thanks so much for coordinating
> the
> > > release.
> > >
> > > Folks, please try out 1.14 - it's our best release yet!
> > >
> > >> On Sat, Aug 4, 2018 at 11:35 PM Boaz Ben-Zvi  wrote:
> > >>
> > >> On behalf of the Apache Drill community, I am happy to announce the
> > >> release of Apache Drill 1.14.0.
> > >>
> > >> For information about Apache Drill, and to get involved, visit the
> project
> > >> website [1].
> > >>
> > >> This release of Drill provides the following many new features and
> > >> improvements:
> > >>
> > >> 
> =
> > >>
> > >> - Ability to run Drill in a Docker container. (DRILL-6346)
> > >>
> > >> - Ability to export and save your storage plugin configurations to a
> JSON
> > >> file for reuse. (DRILL-4580)
> > >>
> > >> - Ability to manage storage plugin configurations in the Drill
> > >> configuration file, storage-plugins-override.conf. (DRILL-6494)
> > >>
> > >> - Functions that return data type information. (DRILL-6361)
> > >>
> > >> - The Drill kafka storage plugin supports filter pushdown for query
> > >> conditions on certain Kafka metadata fields in messages. (DRILL-5977)
> > >>
> > >> - Spill to disk for the Hash Join operator. (DRILL-6027)
> > >>
> > >> - The dfs storage plugin supports a Logfile plugin extension that
> enables
> > >> Drill to directly read and query log files of any format. (DRILL-6104)
> > >>
> > >> - Phonetic and string distance functions. (DRILL-6519)
> > >>
> > >> - The store.hive.conf.properties option enables you to specify Hive
> > >> properties at the session level using the SET command. (DRILL-6575)
> > >>
> > >> - Drill can directly manage the CPU resources through the Drill
> start-up
> > >> script, drill-env.sh; you no longer have to manually add the PID to
> the
> > >> cgroup.procs file each time a Drillbit restarts. (DRILL-143)
> > >>
> > >> - Drill can query the metadata in various image formats with the image
> > >> metadata format plugin. (DRILL-4364)
> > >>
> > >> - Enhanced decimal data type support. (DRILL-6094)
> > >>
> > >> - Option to push LIMIT(0) on top of SCAN. (DRILL-6574)
> > >>
> > >> - Parquet filter pushdown improvements. (DRILL-6174)
> > >>
> > >> - Drill can infer filter conditions for join queries and push the
> filter
> > >> conditions down to the data source. (DRILL-6173)
> > >>
> > >> - Drill uses a native reader to read Hive tables when you enable the
> > >> store.hive.optimize_scan_with_native_readers option. When enabled,
> Drill
> > >> reads data faster and applies filter pushdown optimizations.
> (DRILL-6331)
> > >>
> > >> - Early release of lateral join. (DRILL-5999)
> > >>
> > >> ===
> > >>
> > >> For the full list please see the release notes at [2].
> > >>
> > >> The binary and source artifacts are available here [3].
> > >>
> > >> 1. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> apache.org_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_
> f3xin2DwJkVwQLS8RXwH1M=G_3ypTlStJfrELX2T2sHdvi0H567Rj-3Ccim7xIgzZE=
> > >> 2. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> apache.org_docs_apache-2Ddrill-2D1-2D14-2D0-2Drelease-2Dnotes_=DwIBaQ=
> cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=
> wgS7HUblyvBJyV2zHVsb_f3xin2DwJkVwQLS8RXwH1M=HjAcDFcq9VkLkb6to_
> NU8HnHuqBLEfmO0rha4JH_FGU=
> > >> 3. https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> apache.org_download_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_
> 

Re: [ANNOUNCE] Apache Drill Release 1.14.0

2018-08-09 Thread Vlad Rozov
I'd recommend announcing the release on the ASF-wide mailing list 
(annou...@apache.org) as well as there may be other community members who may 
get interested in the new functionality. Please see [1] for the requirements.

Thank you,

Vlad

[1] http://www.apache.org/legal/release-policy.html#release-announcements

On 2018/08/06 06:15:55, Charles Givre  wrote: 
> Thanks Boaz and great work everyone!
> 
> Sent from my iPhone
> 
> > On Aug 5, 2018, at 21:52, Abhishek Girish  wrote:
> > 
> > Congratulations, everyone! And Boaz, thanks so much for coordinating the
> > release.
> > 
> > Folks, please try out 1.14 - it's our best release yet!
> > 
> >> On Sat, Aug 4, 2018 at 11:35 PM Boaz Ben-Zvi  wrote:
> >> 
> >> On behalf of the Apache Drill community, I am happy to announce the
> >> release of Apache Drill 1.14.0.
> >> 
> >> For information about Apache Drill, and to get involved, visit the project
> >> website [1].
> >> 
> >> This release of Drill provides the following many new features and
> >> improvements:
> >> 
> >> =
> >> 
> >> - Ability to run Drill in a Docker container. (DRILL-6346)
> >> 
> >> - Ability to export and save your storage plugin configurations to a JSON
> >> file for reuse. (DRILL-4580)
> >> 
> >> - Ability to manage storage plugin configurations in the Drill
> >> configuration file, storage-plugins-override.conf. (DRILL-6494)
> >> 
> >> - Functions that return data type information. (DRILL-6361)
> >> 
> >> - The Drill kafka storage plugin supports filter pushdown for query
> >> conditions on certain Kafka metadata fields in messages. (DRILL-5977)
> >> 
> >> - Spill to disk for the Hash Join operator. (DRILL-6027)
> >> 
> >> - The dfs storage plugin supports a Logfile plugin extension that enables
> >> Drill to directly read and query log files of any format. (DRILL-6104)
> >> 
> >> - Phonetic and string distance functions. (DRILL-6519)
> >> 
> >> - The store.hive.conf.properties option enables you to specify Hive
> >> properties at the session level using the SET command. (DRILL-6575)
> >> 
> >> - Drill can directly manage the CPU resources through the Drill start-up
> >> script, drill-env.sh; you no longer have to manually add the PID to the
> >> cgroup.procs file each time a Drillbit restarts. (DRILL-143)
> >> 
> >> - Drill can query the metadata in various image formats with the image
> >> metadata format plugin. (DRILL-4364)
> >> 
> >> - Enhanced decimal data type support. (DRILL-6094)
> >> 
> >> - Option to push LIMIT(0) on top of SCAN. (DRILL-6574)
> >> 
> >> - Parquet filter pushdown improvements. (DRILL-6174)
> >> 
> >> - Drill can infer filter conditions for join queries and push the filter
> >> conditions down to the data source. (DRILL-6173)
> >> 
> >> - Drill uses a native reader to read Hive tables when you enable the
> >> store.hive.optimize_scan_with_native_readers option. When enabled, Drill
> >> reads data faster and applies filter pushdown optimizations. (DRILL-6331)
> >> 
> >> - Early release of lateral join. (DRILL-5999)
> >> 
> >> ===
> >> 
> >> For the full list please see the release notes at [2].
> >> 
> >> The binary and source artifacts are available here [3].
> >> 
> >> 1. https://drill.apache.org/
> >> 2. https://drill.apache.org/docs/apache-drill-1-14-0-release-notes/
> >> 3. https://drill.apache.org/download/
> >> 
> >> Thanks to everyone in the community who contributed to this release!
> >> 
> >>Boaz
> >> 
> >> 
> >> 
> >> 
> 


Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement

2018-08-09 Thread Paul Rogers
Hi Alex,

Perhaps Parth can jump in here as he has deeper knowledge of Parquet.

My understanding is that the planning-time metadata is used for partition 
(directory) and row group pruning. By scanning the footer of each Parquet row 
group, Drill can determine whether that group contains data that the query must 
process. Only groups that could contain target data are included in the list of 
groups to be scanned by the query's scan operators.

For example, suppose we do a query over a date range and a US state:

SELECT saleDate, prodId, quantity FROM `sales`WHERE dir0 BETWEEN '2018-07-01` 
AND '2018-07-31'AND state = 'CA'

Suppose sales is a directory that contains subdirectories partitioned by date, 
and that state is a dictionary-encoded field with metadata in the Parquet 
footer.

In this case, Drill reads the directory structure to do the partition pruning 
at planning time. Only the matching directories are added to the list of files 
to be scanned at run time.

Then, Drill scans the footers of each Parquet row group to find those with 
`state` fields that contain the value of 'CA'. (I believe the footer scanning 
is done only for matching partitions, but I've not looked at that code 
recently.)

Finally, Drill passes the matching set of files to the distribution planner, 
which assigns files to minor fragments distributed across the Drill cluster. 
The result attempts to maximize locality and minimize skew.

Could this have been done differently? There is a trade-off. If the work is 
done in the planner, then it can be very expensive for large data sets. But, on 
the other hand, the resulting distribution plan has minimal skew: each file 
that is scheduled for scanning does, in fact, need to be scanned.

Suppose that Drill pushed the partition and row group filtering to the scan 
operator. In this case, operators that received files outside of the target 
date range would do nothing, while those with files in the range would need to 
do further work, resulting in possible skew. (One could argue that, if files 
are randomly distributed across nodes, then, on average, each scan operator 
will see roughly the same number of included and excluded files, reducing skew. 
This would be an interesting experiment to try.)

Another approach might be to do a two-step execution. First distribute the work 
of doing the partition pruning and row group footer checking. Gather the 
results. Then, finish planning the query and distribute the execution as is 
done today. Here, much new code would be needed to implement this new form of 
task, which is probably why it has not yet been done.

Might be interesting for someone to prototype that to see if we get improvement 
in planning performance. My guess is that, a prototype could be built that does 
the "pre-selection" work as a Drill query, just to see if distribution of the 
work helps. If that is a win, then perhaps a more robust solution could be 
developed from there.

The final factor is Drill's crude-but-effective metadata caching mechanism. 
Drill reads footer and directory information for all Parquet files in a 
directory structure and stores it in a series of possibly-large JSON files, one 
in each directory. This mechanism has no good concurrency control. (This is a 
long-standing, hard-to-fix bug.) Centralizing access in the Foreman reduces the 
potential for concurrent writes on metadata refresh (but does not eliminate 
them since the Forman are themselves distributed.) Further, since the metadata 
is in one big file (in the root directory), and is in JSON (which does not 
support block-splitting), then it is actually not possible to distribute the 
planning-time selection work. I understand that the team is looking at 
alternative metadata implementations to avoid some of these existing issues. 
(There was some discussion on the dev list over the last few months.)

In theory it would be possible to adopt a different metadata structure that 
would better support distributed selection planning. Perhaps farm out work at 
each directory level: in each directory, farm out work for the subdirectories 
(if they match the partition selection) and files. Recursively apply this down 
the directory tree. But, again, Drill does not have a general purpose task 
distribution (map/reduce) mechanism; Drill only knows how to distribute 
queries. Store metadata in a format that is versioned (to avoid concurrent 
read/write access), and block-oriented (to allow splitting the work to read a 
large metadata file.)

It is interesting to note that several systems (including NetFlix's MetaCat, 
[1]) also hit bottlenecks in metadata access when metadata is stored in a 
relational DB, as Hive does. Hive-oriented tools have the same kind of metadata 
bottleneck, but for different reasons: because of the huge load placed on the 
Hive metastore. So, there is an opportunity for innovation in this area by 
observing, as you did, that metadata itself is a big data problem and  

Re: Drill + Flink

2018-08-09 Thread Vlad Rozov
>From the integration between Drill and Flink point of view, such possibility 
>sounds feasible. There may be a few technical challenges to solve as queriable 
>state API is still evolving, AFAIK. For now, I'll be more interested in a use 
>case for the integration between Drill and Flink or any other streaming 
>platform that provides an ability to query its pipeline state (Apex has such 
>ability as well, even though API is not formalized).

Thank you,

Vlad

On 2018/08/09 14:09:38, Charles Givre  wrote: 
> Hi Vlad, 
> I was thinking of Flink’s queriable state.  A few people were asking me 
> questions about Flink so I started doing some digging into its internals and 
> it seemed like an interesting possibility. 
> — C
> 
> > On Aug 8, 2018, at 11:28, Vlad Rozov  wrote:
> > 
> > Do you refer to Flink's queriable state? If not, can you describe a use 
> > case?
> > 
> > Thank you,
> > 
> > Vlad
> > 
> > On 2018/08/08 14:45:50, Charles Givre  wrote: 
> >> Hello all, 
> >> Has there been any discussion of creating a storage plugin for Drill to be 
> >> able to query Apache Flink?
> >> —C
> 
> 


[GitHub] vrozov commented on a change in pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes

2018-08-09 Thread GitBox
vrozov commented on a change in pull request #1428: DRILL-6670: align Parquet 
TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes
URL: https://github.com/apache/drill/pull/1428#discussion_r209014522
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##
 @@ -108,7 +108,7 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
 
 if 
(!context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER)
 && !ParquetReaderUtility.containsComplexColumn(footer, 
rowGroupScan.getColumns())) {
-  logger.debug("Query {} qualifies for new Parquet reader",
+  logger.debug("Query {} qualifies for ParquetRecordReader",
 
 Review comment:
   IMO, it will be much better to log `PARQUET_NEW_RECORD_READER` option and 
whether or not there are complex columns prior to creating a reader and reader 
instance class after the reader is created:
   ```
   log.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", 
"enabled"/"disabled", "found"/"not found");
   Reader reader;
   if (...) {
 reader = new ParquetRecordReader()l
   } else {
 reader = new DrillParquetReader();
   }
   log.debug("Query {} uses {});
   readers.add(reader);
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on issue #1423: Updated URL for Apache Parquet logical types

2018-08-09 Thread GitBox
sohami commented on issue #1423: Updated URL for Apache Parquet logical types
URL: https://github.com/apache/drill/pull/1423#issuecomment-411812573
 
 
   @davechallis - Thanks for your changes, since it's already merged by 
@bbevens on doc side I think we can close this PR now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: 回复:Is Drill query execution processing model just the same idea with the Spark whole-stage codegen improvement

2018-08-09 Thread Oleksandr Kalinin
Hi Paul and Drill developers,

I am sorry for slight off-topic maybe, but I noticed that Drill's foreman
collects metadata of all queried files in PLANNING state (ref. class
e.g. MetadataGatherer), at least in case of Parquet when using dfs plugin.
That costs a lot of time when number of queried files is substantial since
MetadataGatherer task is not distributed across cluster nodes. What is the
reason behind this collection? This doesn't seem to match schema-on-read
philosophy, but then it's maybe just me or my setup, I am still very new to
Drill.

Also, I appreciate the advantage of metastore-free operations - in theory
it makes Drill more reliable and less costly to run. At the same time there
is Drill Metadata store project, meaning that evolution is actually
shifting from metastore-free system. What are the reasons of that
evolution? Or is it going to be an additional optional feature?

Thanks,
Best Regards,
Alex

On Tue, Aug 7, 2018 at 10:25 PM, Paul Rogers 
wrote:

> Hi Qiaoyi,
>
> In general, optimal performance occurs when a system knows the schema at
> the start and can fully optimize based on that schema. Think C or C++
> compilers compared with Java or Python.
>
> On the other hand, the JVM HotSpot optimizer has shown that one can
> achieve very good performance via incremental optimization, but at the cost
> of extreme runtime complexity. The benefit is that Java is much more
> flexible, machine independent, and simpler than C or C++ (at least for
> non-system applications.)
>
> Python is the other extreme: it is so dynamic that the literature has
> shown that it is very difficult to optimize Python at the compiler or
> runtime level. (Though, there is some interesting research on this topic.
> See [1], [2].)
>
> Drill is somewhere in the middle. Drill does not do code generation at the
> start like Impala or Spark do. Nor is it fully interpreted. Rather, Drill
> is roughly like Java: code generation is done at runtime based on the
> observed data types. (The JVM does machine code generation based on
> observed execution patterns.) The advantage is that Drill is able to
> achieve its pretty-good performance without the cost of a metadata system
> to provide schema at plan time.
>
> So, to get the absolute fastest performance (think Impala), you must pay
> the cost of a metadata system for all queries. Drill gets nearly as good
> performance without the complexity of the Hive metastore -- a pretty good
> tradeoff.
>
> If I may ask, what is your area of interest? Are you looking to use Drill
> for some project? Or, just interested in how Drill works?
>
> Thanks,
> - Paul
>
> [1] Towards Practical Gradual Typing: https://blog.acolyer.
> org/2015/08/03/towards-practical-gradual-typing/
> [2] Is Sound Gradual Typing Dead? https://blog.acolyer.
> org/2016/02/05/is-sound-gradual-typing-dead/
>
>
>
>
>
>
>
> On Sunday, August 5, 2018, 11:36:52 PM PDT, 丁乔毅(智乔) <
> qiaoyi.din...@alibaba-inc.com> wrote:
>
>  Thanks Paul, good to know the design principals of the Drill query
> execution process model.
> I am very new to Drill, please bear with me.
>
> One more question.
> As you mentioned, the schema-free processing is the key feature to be
> advantage over Spark, is there any performance consideration behind this
> design except the techniques of the dynamic codegen and vectorization
> computation?
>
> Regards,
> Qiaoyi
>
>
> --
> 发件人:Paul Rogers 
> 发送时间:2018年8月4日(星期六) 02:27
> 收件人:dev 
> 主 题:Re: Is Drill query execution processing model just the same idea with
> the Spark whole-stage codegen improvement
>
> Hi Qiaoyi,
> As you noted, Drill and Spark have similar models -- but with important
> differences.
> Drill is schema-on-read (also called "schema less"). In particular, this
> means that Drill does not know the schema of the data until the first row
> (actually "record batch") arrives at each operator. Once Drill sees that
> first batch, it has a data schema, and can generate the corresponding code;
> but only for that one operator.
> The above process repeats up the fragment ("fragment" is Drill's term for
> a Spark stage.)
> I believe that Spark requires (or at least allows) the user to define a
> schema up front. This is particularly true for the more modern data frame
> APIs.
> Do you think the Spark improvement would apply to Drill's case of
> determining the schema operator-by-opeartor up the DAG?
> Thanks,
> - Paul
>
>
>
> On Friday, August 3, 2018, 8:57:29 AM PDT, 丁乔毅(智乔) <
> qiaoyi.din...@alibaba-inc.com> wrote:
>
>
> Hi, all.
>
> I'm very new to Apache Drill.
>
> I'm quite interest in Drill query execution's implementation.
> After a little bit of source code reading, I found it is built on a
> processing model quite like a data-centric pushed-based style, which is
> very similar with the idea behind the Spark whole-stage codegen
> improvement(jira ticket https://issues.apache.org/jira/browse/SPARK-12795)
>
> And I 

Re: Drill + Flink

2018-08-09 Thread Charles Givre
Hi Vlad, 
I was thinking of Flink’s queriable state.  A few people were asking me 
questions about Flink so I started doing some digging into its internals and it 
seemed like an interesting possibility. 
— C

> On Aug 8, 2018, at 11:28, Vlad Rozov  wrote:
> 
> Do you refer to Flink's queriable state? If not, can you describe a use case?
> 
> Thank you,
> 
> Vlad
> 
> On 2018/08/08 14:45:50, Charles Givre  wrote: 
>> Hello all, 
>> Has there been any discussion of creating a storage plugin for Drill to be 
>> able to query Apache Flink?
>> —C



[GitHub] okalinin commented on issue #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions

2018-08-09 Thread GitBox
okalinin commented on issue #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS 
logical type handling with earlier versions
URL: https://github.com/apache/drill/pull/1428#issuecomment-411726502
 
 
   I will push another ParquetRecordReader fix for logical type handling on 
Parquet files with dictionary encoding disabled. Please do not review PR yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870590
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws 
Exception{
 public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
-bit1.run();
-client.connect();
-List results = client.runQuery(QueryType.LOGICAL, 
Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), 
Charsets.UTF_8));
-int count = 0;
-for(QueryDataBatch b : results){
-count += b.getHeader().getRowCount();
-b.release();
-}
-assertEquals(100, count);
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   `try(` -> `try (`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208866703
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
 ##
 @@ -104,6 +104,6 @@ public void setMaxAllocation(long maxAllocation) {
   public boolean isBufferedOperator(QueryContext queryContext) {
 // In case forced to use a single partition - do not consider this a 
buffered op (when memory is divided)
 return queryContext == null ||
-  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR)
 ;
+  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR);
 
 Review comment:
   `(int)queryContext` -> `(int) queryContext`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870154
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -43,7 +43,8 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   `){` -> `) {`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871013
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws 
Exception{
 public void twoBitOneExchangeTwoEntryRun() throws Exception{
   RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = 
new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator());){
+  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   `try(` -> `try (`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870633
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -82,16 +84,17 @@ public void oneBitOneExchangeTwoEntryRun() throws 
Exception{
 public void oneBitOneExchangeTwoEntryRunLogical() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
-bit1.run();
-client.connect();
-List results = client.runQuery(QueryType.LOGICAL, 
Files.toString(DrillFileUtils.getResourceAsFile("/scan_screen_logical.json"), 
Charsets.UTF_8));
-int count = 0;
-for(QueryDataBatch b : results){
-count += b.getHeader().getRowCount();
-b.release();
-}
-assertEquals(100, count);
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   `){` -> `) {`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870431
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -43,7 +43,8 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   `try(` -> `try (`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871104
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -101,7 +104,9 @@ public void oneBitOneExchangeTwoEntryRunLogical() throws 
Exception{
 public void twoBitOneExchangeTwoEntryRun() throws Exception{
   RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); Drillbit bit2 = 
new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator());){
+  try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+  Drillbit bit2 = new Drillbit(CONFIG, serviceSet);
+  DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   `){` -> `) {`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870019
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
 ##
 @@ -227,7 +227,7 @@ private ValueVector evalExprWithInterpreter(String 
expression, RecordBatch batch
   }
 
   private void showValueVectorContent(ValueVector vw) {
-for (int row = 0; row < vw.getAccessor().getValueCount(); row ++ ) {
+for (int row = 0; row < vw.getAccessor().getValueCount(); row++ ) {
 
 Review comment:
   Please remove space: `row++ )` -> `row++)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208869017
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java
 ##
 @@ -84,7 +84,9 @@ protected void explain_(
 s.append("  ");
 
 if (id != null && id.opId == 0) {
-  for(int i =0; i < spacer.get(); i++){ s.append('-');}
+  for(int i = 0; i < spacer.get(); i++) {
 
 Review comment:
   `for(int i` -> `for (int i`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870496
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{
   public void oneBitOneExchangeTwoEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
 
 Review comment:
   `try(` -> `try (`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871299
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/protocol/TestOperatorRecordBatch.java
 ##
 @@ -99,7 +99,10 @@ public BatchAccessor batchAccessor() {
 }
 
 @Override
-public boolean buildSchema() { buildSchemaCalled = true; return ! 
schemaEOF; }
+public boolean buildSchema() {
+  buildSchemaCalled = true;
+  return ! schemaEOF;
 
 Review comment:
   `! schemaEOF` -> `!schemaEOF`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208865479
 
 

 ##
 File path: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBasePushFilterIntoScan.java
 ##
 @@ -125,7 +125,7 @@ protected void doPushFilterToScan(final RelOptRuleCall 
call, final FilterPrel fi
 final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType());
 
 // Depending on whether is a project in the middle, assign either scan or 
copy of project to childRel.
-final RelNode childRel = project == null ? newScanPrel : 
project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));;
+final RelNode childRel = project == null ? newScanPrel : 
project.copy(project.getTraitSet(), ImmutableList.of((RelNode)newScanPrel));
 
 Review comment:
   Could you please add space: `(RelNode)newScanPrel` -> `(RelNode) newScanPrel`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871483
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java
 ##
 @@ -292,7 +292,8 @@ public static void writeComplexValues(GroupFactory gf, 
ParquetWriter comp
   .append("_INT_64", 0x7FFFL)
   .append("_UINT_64", 0xL)
   .append("_DECIMAL_decimal18", 0xL);
-  byte[] bytes = new byte[30]; Arrays.fill(bytes, (byte)1);
+  byte[] bytes = new byte[30];
+  Arrays.fill(bytes, (byte)1);
 
 Review comment:
   `(byte)1` -> `(byte) 1`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208870539
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ##
 @@ -63,7 +64,8 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{
   public void oneBitOneExchangeTwoEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = 
new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
+DrillClient client = new DrillClient(CONFIG, 
serviceSet.getCoordinator())){
 
 Review comment:
   `){` -> `) {`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208866591
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ##
 @@ -713,7 +713,7 @@ public void eval() {
 if (length.value > 0) {
   charLen = Math.min((int)length.value, charCount);  //left('abc', 5) 
-> 'abc'
 } else if (length.value < 0) {
-  charLen = Math.max(0, charCount + (int)length.value) ; // 
left('abc', -5) ==> ''
+  charLen = Math.max(0, charCount + (int)length.value); // left('abc', 
-5) ==> ''
 
 Review comment:
   Could you please add space: `(int)length` -> `(int) length`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208868830
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -302,9 +302,12 @@ private LogicalExpression 
getDrillCastFunctionFromOptiq(RexCall call){
 castType = 
Types.required(MinorType.VARCHAR).toBuilder().setPrecision(call.getType().getPrecision()).build();
 break;
 
-  case "INTEGER": castType = Types.required(MinorType.INT); break;
-  case "FLOAT": castType = Types.required(MinorType.FLOAT4); break;
-  case "DOUBLE": castType = Types.required(MinorType.FLOAT8); break;
+  case "INTEGER": castType = Types.required(MinorType.INT);
 
 Review comment:
   Could you please move the assigning to the next line here and in this class 
below


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208866832
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java
 ##
 @@ -88,6 +88,6 @@ public void setMaxAllocation(long maxAllocation) {
   public boolean isBufferedOperator(QueryContext queryContext) {
 // In case forced to use a single partition - do not consider this a 
buffered op (when memory is divided)
 return queryContext == null ||
-  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR)
 ;
+  1 < 
(int)queryContext.getOptions().getOption(ExecConstants.HASHJOIN_NUM_PARTITIONS_VALIDATOR);
 
 Review comment:
   `(int)queryContext` -> `(int) queryContext`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871960
 
 

 ##
 File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/StatementTest.java
 ##
 @@ -83,7 +83,7 @@ public static void tearDownStatement() throws SQLException {
   public void testDefaultGetQueryTimeout() throws SQLException {
 try(Statement stmt = connection.createStatement()) {
   int timeoutValue = stmt.getQueryTimeout();
-  assertEquals( 0 , timeoutValue );
+  assertEquals( 0, timeoutValue );
 
 Review comment:
   `( 0, timeoutValue )` -> `(0, timeoutValue)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208871597
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetSimpleTestFileGenerator.java
 ##
 @@ -375,7 +376,8 @@ public static void writeSimpleValues(SimpleGroupFactory 
sgf, ParquetWriter `(byte) 1`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow extra semicolons in import statements.

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1415: DRILL-6656: Disallow 
extra semicolons in import statements.
URL: https://github.com/apache/drill/pull/1415#discussion_r208867531
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ##
 @@ -235,7 +235,7 @@ public static boolean isScalarSubquery(RelNode root) {
   if (currentrel instanceof DrillAggregateRel) {
 agg = (DrillAggregateRel)currentrel;
   } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset) currentrel).getBest() ;
+currentrel = ((RelSubset)currentrel).getBest();
 
 Review comment:
   `(RelSubset)currentrel` -> `(RelSubset) currentrel`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208841312
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -169,13 +169,31 @@ public void 
testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception {
   @Test
   public void testMultiUnnestAtSameLevel() throws Exception {
 String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, 
U1.order_id, U1.order_amt," +
 
 Review comment:
   And here: `Sql` -> `sql`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208839659
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java
 ##
 @@ -138,7 +139,7 @@ public PhysicalOperator visitUnnest(UnnestPOP unnest, 
IndexedFragmentNode value)
 final Wrapper info;
 final int minorFragmentId;
 
 Review comment:
   It would be good to make them private too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208840867
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -169,13 +169,31 @@ public void 
testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception {
   @Test
   public void testMultiUnnestAtSameLevel() throws Exception {
 String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, 
U1.order_id, U1.order_amt," +
-  " U1.itemName, U1.itemNum" + " FROM 
cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" +
-  " (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, 
U2.item_name AS itemName, U2.item_num AS " +
-"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" +
-  " (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM 
UNNEST(t.ord) AS t1(ord)) AS U2) AS U1";
+" U1.itemName, U1.itemNum" + " FROM 
cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" +
+" (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, 
U2.item_name AS itemName, U2.item_num AS " +
+"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" +
+" (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num 
FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1";
 runAndLog(Sql);
   }
 
+  @Test
+  public void testMultiUnnestAtSameLevelExec() throws Exception {
 
 Review comment:
   Could you please rewrite this test to validate query result with some 
expected result, since it is possible that after some changes both these 
queries may return the same incorrect result.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208841144
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -169,13 +169,31 @@ public void 
testLeftLateral_WithFilterAndLimitInSubQuery() throws Exception {
   @Test
   public void testMultiUnnestAtSameLevel() throws Exception {
 String Sql = "EXPLAIN PLAN FOR SELECT customer.c_name, customer.c_address, 
U1.order_id, U1.order_amt," +
-  " U1.itemName, U1.itemNum" + " FROM 
cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" +
-  " (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, 
U2.item_name AS itemName, U2.item_num AS " +
-"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" +
-  " (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num FROM 
UNNEST(t.ord) AS t1(ord)) AS U2) AS U1";
+" U1.itemName, U1.itemNum" + " FROM 
cp.`lateraljoin/nested-customer.parquet` customer, LATERAL" +
+" (SELECT t.ord.o_id AS order_id, t.ord.o_amount AS order_amt, 
U2.item_name AS itemName, U2.item_num AS " +
+"itemNum FROM UNNEST(customer.orders) t(ord) , LATERAL" +
+" (SELECT t1.ord.i_name AS item_name, t1.ord.i_number AS item_num 
FROM UNNEST(t.ord) AS t1(ord)) AS U2) AS U1";
 runAndLog(Sql);
   }
 
+  @Test
+  public void testMultiUnnestAtSameLevelExec() throws Exception {
+String Sql = "SELECT customer.c_name, customer.c_address, U1.order_id, 
U1.order_amt," +
 
 Review comment:
   Please rename string in lower case: `Sql` -> `sql`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208839935
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java
 ##
 @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) 
throws RuntimeExcepti
   }
 
   @Override
-  public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws 
RuntimeException {
+  public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) 
throws RuntimeException {
 List children = Lists.newArrayList();
-children.add(((Prel)prel.getInput(0)).accept(this, value));
+children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral));
 
 Review comment:
   Please add space here and in the line below: `(Prel)prel` -> `(Prel) prel`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208840141
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java
 ##
 @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) 
throws RuntimeExcepti
   }
 
   @Override
-  public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws 
RuntimeException {
+  public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) 
throws RuntimeException {
 List children = Lists.newArrayList();
-children.add(((Prel)prel.getInput(0)).accept(this, value));
+children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral));
 children.add(((Prel) prel.getInput(1)).accept(this, true));
 
-return (Prel) prel.copy(prel.getTraitSet(), children);
+if (!isRightOfLateral) {
+  return (Prel) prel.copy(prel.getTraitSet(), children);
+} else {
+  //Adjust the column numbering due to an additional column 
"$drill_implicit_field$" is added to the inputs.
+  Map requiredColsMap = new HashMap<>();
+  for (Integer corrColIndex : prel.getRequiredColumns()) {
+requiredColsMap.put(corrColIndex, corrColIndex+1);
 
 Review comment:
   Please add spaces: `corrColIndex+1` -> `corrColIndex + 1`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208856911
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/LateralUnnestRowIDVisitor.java
 ##
 @@ -59,16 +65,37 @@ public Prel visitPrel(Prel prel, Boolean isRightOfLateral) 
throws RuntimeExcepti
   }
 
   @Override
-  public Prel visitLateral(LateralJoinPrel prel, Boolean value) throws 
RuntimeException {
+  public Prel visitLateral(LateralJoinPrel prel, Boolean isRightOfLateral) 
throws RuntimeException {
 List children = Lists.newArrayList();
-children.add(((Prel)prel.getInput(0)).accept(this, value));
+children.add(((Prel)prel.getInput(0)).accept(this, isRightOfLateral));
 children.add(((Prel) prel.getInput(1)).accept(this, true));
 
-return (Prel) prel.copy(prel.getTraitSet(), children);
+if (!isRightOfLateral) {
+  return (Prel) prel.copy(prel.getTraitSet(), children);
+} else {
+  //Adjust the column numbering due to an additional column 
"$drill_implicit_field$" is added to the inputs.
+  Map requiredColsMap = new HashMap<>();
+  for (Integer corrColIndex : prel.getRequiredColumns()) {
+requiredColsMap.put(corrColIndex, corrColIndex+1);
+  }
+  ImmutableBitSet requiredColumns = prel.getRequiredColumns().shift(1);
+
+  CorrelationId corrId = prel.getCluster().createCorrel();
+  RexCorrelVariable rexCorrel =
+  (RexCorrelVariable) prel.getCluster().getRexBuilder().makeCorrel(
+  children.get(0).getRowType(),
+  corrId);
+  RelNode rightChild = children.get(1).accept(
 
 Review comment:
   Looks like `ProjectCorrelateTransposeRule.RelNodesExprsHandler` does not 
replace RexCorrelVariable for the case when RelNode does not have any input. In 
our case, this is `UnnestPrel`. So we need to extend this class and override 
method `visit(RelNode other)`:
   ```
   @Override
   public RelNode visit(RelNode other) {
 return super.visit(other.accept(rexVisitor));
   }
   ```
   Also, it does not work since `UnnestPrel` use `accept(RexShuttle shuttle)` 
method from `AbstractRelNode` which does nothing, so we should override it 
somehow like:
   ```
 @Override
 public RelNode accept(RexShuttle shuttle) {
   RexNode ref = shuttle.apply(this.ref);
   if (this.ref == ref) {
 return this;
   }
   return new UnnestPrel(getCluster(), traitSet, rowType, ref);
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level lateral unnest join is throwing an exception …

2018-08-09 Thread GitBox
vvysotskyi commented on a change in pull request #1426: DRILL-6671: Multi level 
lateral unnest join is throwing an exception …
URL: https://github.com/apache/drill/pull/1426#discussion_r208839528
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java
 ##
 @@ -116,14 +118,13 @@ public PhysicalOperator visitLateralJoin(LateralJoinPOP 
op, IndexedFragmentNode
 
 children.add(op.getLeft().accept(this, iNode));
 children.add(op.getRight().accept(this, iNode));
-UnnestPOP unnestInLeftInput = iNode.getUnnest();
+UnnestPOP unnestForThisLateral = iNode.getUnnest();
 
 PhysicalOperator newOp = op.getNewWithChildren(children);
 newOp.setCost(op.getCost());
 newOp.setOperatorId(Short.MAX_VALUE & op.getOperatorId());
 
-((LateralJoinPOP)newOp).setUnnestForLateralJoin(unnestInLeftInput);
-
+((LateralJoinPOP)newOp).setUnnestForLateralJoin(unnestForThisLateral);
 
 Review comment:
   Please add space: `((LateralJoinPOP)newOp)` -> `((LateralJoinPOP) newOp)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davechallis commented on issue #1423: Updated URL for Apache Parquet logical types

2018-08-09 Thread GitBox
davechallis commented on issue #1423: Updated URL for Apache Parquet logical 
types
URL: https://github.com/apache/drill/pull/1423#issuecomment-411687191
 
 
   @sohami would you still like me to open a Jira ticket, or has this been 
resolved now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] okalinin opened a new pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions

2018-08-09 Thread GitBox
okalinin opened a new pull request #1428: DRILL-6670: align Parquet 
TIMESTAMP_MICROS logical type handling with earlier versions
URL: https://github.com/apache/drill/pull/1428
 
 
   ## DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with 
earlier versions
   ### Background
   Parquet dependencies were upgraded with DRILL-6353 which changed behavior in 
how Parquet handles `TIMESTAMP_MICROS` logical type. Previously, calling 
`SchemaElement.getConverted_type()` was returning null and logical type was 
ignored. With updated Parquet version above call returns `TIMESTAMP_MICROS` 
which triggers query exception.
   
   Issue impacts both ParquetRecordReader and DrillParquetReader.
   
   This change aims at restoring original functionality - handling 
`TIMESTAMP_MICROS` as `INT64` with no logical type in both Parquet readers. It 
doesn't seem to make sense to do more since `TIMESTAMP_MICROS` is deprecated 
logical type as per Parquet [current 
documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md).
   ### Change description
   - Added `TIMESTAMP_MICROS` logical type in both Parquet readers in order to 
handle it as regular `INT64`
   - Modified `ParquetSimpleTestFileGenerator` to include `TIMESTAMP_MICROS` 
columns in Parquet test files and updated existing test data files
   - Modified respective tests with minor regrouping
   - Fixed `TestDrillParquetReader` to make sure correct reader is actually 
used to handle test queries
   - Improved debug lines to avoid using ambiguous 'new/old Parquet reader' 
terminology
   ### Level of testing
   - Sample file and query provided by issue reporter
   - Unit tests
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6677) Check style unused import check conflicts with Eclipse

2018-08-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6677:
--

 Summary: Check style unused import check conflicts with Eclipse
 Key: DRILL-6677
 URL: https://issues.apache.org/jira/browse/DRILL-6677
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0
Reporter: Paul Rogers


Consider the following Java snippet:

{code}
import com.foo.Bar;

/**
  This is an reference to \{\@link com.foo.Bar\}
{code}

Eclipse will notice the reference to {{com.foo.Bar}} in the Javadoc comment and 
its automatic import fixer-upper will include the import.

But, check style appears to ignore Javadoc imports. So, Check style reports the 
import as unused.

The only way, at present, to make Check style happy is to turn off the Eclipse 
import fixer-upper and do everything manually.

Request: modify check style to also check for class references in Javadoc 
comments as such references are required for the Javadoc to build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6676) Add Union, List and Repeated List types to Result Set Loader

2018-08-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6676:
--

 Summary: Add Union, List and Repeated List types to Result Set 
Loader
 Key: DRILL-6676
 URL: https://issues.apache.org/jira/browse/DRILL-6676
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.15.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.15.0


Add support for the "obscure" vector types to the {{ResultSetLoader}}:

* Union
* List
* Repeated List



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)