[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52476961
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

I can change to use applyLimit(int maxRecords).  This method will modify 
the internal state of the groupscan and return true, if the limit is applied. 
Otherwise, leave the groupscan instance unchanged and return false.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52465362
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

How about:

boolean applyLimit(int maxRecords)

Returns whether the limit was applied. Default implementation in 
AbstractGroupScan is return false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52455844
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

It seems like we can move this functionality into a method on groupscan and 
then make it generic with only parquet group scan implementing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Unresolved dependencies

2016-02-10 Thread Abdel Hakim Deneche
Are you using maven ? what error message are you seeing ?

On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka 
wrote:

> I saw unresolved dependencies in
> /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> in the latest version github.com/apache/drill.
> Which library must be added to pom.xml?
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Unresolved dependencies

2016-02-10 Thread Vitalii Diravka
I use IntelliJ IDEA

2016-02-10 19:38 GMT+02:00 Abdel Hakim Deneche :

> Are you using Eclipse or IntelliJ ?
>
> On Wed, Feb 10, 2016 at 9:29 AM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > Yes, I am. Version of maven is 3.3.3.
> > No error. "mvn clean install -DskipTests" is built successfully.
> > But import statements have missing classes (in HiveTestUDFImpls,
> > TestSqlStdBasedAuthorization,
> > TestStorageBasedHiveAuthorization classes).
> > Looks like some library is missing in
> >
> /home/vitalii/ProjectSource/drillforhive/contrib/storage-hive/core/pom.xml
> >
> > 2016-02-10 18:38 GMT+02:00 Abdel Hakim Deneche :
> >
> > > Are you using maven ? what error message are you seeing ?
> > >
> > > On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka <
> > > vitalii.dira...@gmail.com>
> > > wrote:
> > >
> > > > I saw unresolved dependencies in
> > > > /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> > > > in the latest version github.com/apache/drill.
> > > > Which library must be added to pom.xml?
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>


Re: Unresolved dependencies

2016-02-10 Thread Vitalii Diravka
Yes, I am. Version of maven is 3.3.3.
No error. "mvn clean install -DskipTests" is built successfully.
But import statements have missing classes (in HiveTestUDFImpls,
TestSqlStdBasedAuthorization,
TestStorageBasedHiveAuthorization classes).
Looks like some library is missing in
/home/vitalii/ProjectSource/drillforhive/contrib/storage-hive/core/pom.xml

2016-02-10 18:38 GMT+02:00 Abdel Hakim Deneche :

> Are you using maven ? what error message are you seeing ?
>
> On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > I saw unresolved dependencies in
> > /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> > in the latest version github.com/apache/drill.
> > Which library must be added to pom.xml?
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52494516
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

Oh wait, GroupScan should be treated as immutable. I shouldn't have 
suggested that interface.

how about:

GroupScan applyLimit(int maxRecords)

and return new GroupScan in case of limit application. Otherwise, null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Unresolved dependencies

2016-02-10 Thread Abdel Hakim Deneche
Are you using Eclipse or IntelliJ ?

On Wed, Feb 10, 2016 at 9:29 AM, Vitalii Diravka 
wrote:

> Yes, I am. Version of maven is 3.3.3.
> No error. "mvn clean install -DskipTests" is built successfully.
> But import statements have missing classes (in HiveTestUDFImpls,
> TestSqlStdBasedAuthorization,
> TestStorageBasedHiveAuthorization classes).
> Looks like some library is missing in
> /home/vitalii/ProjectSource/drillforhive/contrib/storage-hive/core/pom.xml
>
> 2016-02-10 18:38 GMT+02:00 Abdel Hakim Deneche :
>
> > Are you using maven ? what error message are you seeing ?
> >
> > On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka <
> > vitalii.dira...@gmail.com>
> > wrote:
> >
> > > I saw unresolved dependencies in
> > > /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> > > in the latest version github.com/apache/drill.
> > > Which library must be added to pom.xml?
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52494612
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

If we don't treat GroupScan immutable, we're going to have an issue with 
changing other plan alternatives unintentionally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Unresolved dependencies

2016-02-10 Thread Sudheesh Katkam
I have the same issue. Sometimes, in IntelliJ, I have to manually add the Hive 
jars to the class path (press option + return on Mac). And sometimes, the hive 
subproject pom file changes, which I do not commit. I have not investigated 
what the underlying issue is.

> On Feb 10, 2016, at 9:39 AM, Vitalii Diravka  
> wrote:
> 
> I use IntelliJ IDEA
> 
> 2016-02-10 19:38 GMT+02:00 Abdel Hakim Deneche :
> 
>> Are you using Eclipse or IntelliJ ?
>> 
>> On Wed, Feb 10, 2016 at 9:29 AM, Vitalii Diravka <
>> vitalii.dira...@gmail.com>
>> wrote:
>> 
>>> Yes, I am. Version of maven is 3.3.3.
>>> No error. "mvn clean install -DskipTests" is built successfully.
>>> But import statements have missing classes (in HiveTestUDFImpls,
>>> TestSqlStdBasedAuthorization,
>>> TestStorageBasedHiveAuthorization classes).
>>> Looks like some library is missing in
>>> 
>> /home/vitalii/ProjectSource/drillforhive/contrib/storage-hive/core/pom.xml
>>> 
>>> 2016-02-10 18:38 GMT+02:00 Abdel Hakim Deneche :
>>> 
 Are you using maven ? what error message are you seeing ?
 
 On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka <
 vitalii.dira...@gmail.com>
 wrote:
 
> I saw unresolved dependencies in
> /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> in the latest version github.com/apache/drill.
> Which library must be added to pom.xml?
> 
 
 
 
 --
 
 Abdelhakim Deneche
 
 Software Engineer
 
  
 
 
 Now Available - Free Hadoop On-Demand Training
 <
 
>>> 
>> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> 
 
>>> 
>> 
>> 
>> 
>> --
>> 
>> Abdelhakim Deneche
>> 
>> Software Engineer
>> 
>>  
>> 
>> 
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
>>> 
>> 



[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52498045
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

Agreed. We need maintain GroupScan as immutable. Thanks for the suggestion!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Unresolved dependencies

2016-02-10 Thread Vitalii Diravka
Alt+Enter (on linux) gives a new maven dependency:

  org.apache.hive
  hive-exec
  1.2.1
  test


and after this import statements aren't highlighted.
But all hive tests aren't running after that!


2016-02-10 19:53 GMT+02:00 Sudheesh Katkam :

> I have the same issue. Sometimes, in IntelliJ, I have to manually add the
> Hive jars to the class path (press option + return on Mac). And sometimes,
> the hive subproject pom file changes, which I do not commit. I have not
> investigated what the underlying issue is.
>
> > On Feb 10, 2016, at 9:39 AM, Vitalii Diravka 
> wrote:
> >
> > I use IntelliJ IDEA
> >
> > 2016-02-10 19:38 GMT+02:00 Abdel Hakim Deneche :
> >
> >> Are you using Eclipse or IntelliJ ?
> >>
> >> On Wed, Feb 10, 2016 at 9:29 AM, Vitalii Diravka <
> >> vitalii.dira...@gmail.com>
> >> wrote:
> >>
> >>> Yes, I am. Version of maven is 3.3.3.
> >>> No error. "mvn clean install -DskipTests" is built successfully.
> >>> But import statements have missing classes (in HiveTestUDFImpls,
> >>> TestSqlStdBasedAuthorization,
> >>> TestStorageBasedHiveAuthorization classes).
> >>> Looks like some library is missing in
> >>>
> >>
> /home/vitalii/ProjectSource/drillforhive/contrib/storage-hive/core/pom.xml
> >>>
> >>> 2016-02-10 18:38 GMT+02:00 Abdel Hakim Deneche  >:
> >>>
>  Are you using maven ? what error message are you seeing ?
> 
>  On Wed, Feb 10, 2016 at 4:39 AM, Vitalii Diravka <
>  vitalii.dira...@gmail.com>
>  wrote:
> 
> > I saw unresolved dependencies in
> > /drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
> > in the latest version github.com/apache/drill.
> > Which library must be added to pom.xml?
> >
> 
> 
> 
>  --
> 
>  Abdelhakim Deneche
> 
>  Software Engineer
> 
>   
> 
> 
>  Now Available - Free Hadoop On-Demand Training
>  <
> 
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
> 
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>  
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> >>
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >>>
> >>
>
>


[GitHub] drill pull request: DRILL-4287: During initial DrillTable creation...

2016-02-10 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/345#discussion_r52516038
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -182,10 +183,24 @@ public ParquetGroupScan( //
 }
 
 this.selectionRoot = selectionRoot;
-if (selection instanceof ParquetFileSelection) {
-  final ParquetFileSelection pfs = 
ParquetFileSelection.class.cast(selection);
-  this.parquetTableMetadata = pfs.getParquetMetadata();
+
+FileSelection newSelection = null;
+if (!selection.isExpanded()) {
+  FileStatus firstPath = selection.getFirstPath(fs);
+  Path p = new Path(firstPath.getPath(), Metadata.METADATA_FILENAME);
+  if (!fs.exists(p)) { // no metadata cache
+if (selection.checkedForDirectories() && 
selection.hasDirectories()) {
--- End diff --

This won't work if `checkedForDirectories==false`, right ?
`hasDirectories()` already uses `checkedForDirectories` internally, the 
following should work:
 
if (selection.hasDirectories()) {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4287: During initial DrillTable creation...

2016-02-10 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/345#discussion_r52519003
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -521,6 +543,25 @@ public void setEndpointByteMap(EndpointByteMap 
byteMap) {
 }
   }
 
+  private FileSelection
+  getSelectionFromMetadataCache(DrillFileSystem fs, FileSelection 
selection) throws IOException {
+FileStatus metaRootDir = selection.getFirstPath(fs);
+Path metaFilePath = new Path(metaRootDir.getPath(), 
Metadata.METADATA_FILENAME);
+
+// get the metadata for the directory by reading the metadata file
+Metadata.ParquetTableMetadataBase metadata  = 
Metadata.readBlockMeta(fs, metaFilePath.toString());
+List fileNames = Lists.newArrayList();
+for (Metadata.ParquetFileMetadata file : metadata.getFiles()) {
+  fileNames.add(file.getPath());
+}
+// when creating the file selection, set the selection root in the 
form /a/b instead of
+// file:/a/b.  The reason is that the file names above have been 
created in the form
+// /a/b/c.parquet and the format of the selection root must match that 
of the file names
+// otherwise downstream operations such as partition pruning can break.
+final Path metaRootPath = 
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+return FileSelection.create(selection.getStatuses(fs), fileNames, 
metaRootPath.toString());
--- End diff --

`FileSelection.create()` expects either a list of statuses or a list of 
filenames, but not both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52539112
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+DrillLimitRel limitRel = call.rel(0);
+DrillScanRel scanRel = call.rel(1);
+doOnMatch(call, limitRel, scanRel, null);
+}
+  };
+
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))), 
"DrillPushLimitToScanRule_LimitOnProject") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(2);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillLimitRel limitRel = call.rel(0);
+  DrillProjectRel projectRel = call.rel(1);
+  DrillScanRel scanRel = call.rel(2);
+  doOnMatch(call, limitRel, scanRel, projectRel);
+}
+  };
+
+
+  protected void doOnMatch(RelOptRuleCall call, DrillLimitRel limitRel, 
DrillScanRel scanRel, DrillProjectRel projectRel){
+try {
+  final int rowCountRequested = (int) limitRel.getRows();
+
+  final Pair  newGroupScanPair = 
ParquetGroupScan.filterParquetScanByLimit((ParquetGroupScan)(scanRel.getGroupScan()),
 rowCountRequested);
--- End diff --

@jacques-n , I made change based on your review comments. Could you please 
take another look? Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52540911
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -485,12 +486,14 @@ public void populatePruningVector(ValueVector v, int 
index, SchemaPath column, S
 private EndpointByteMap byteMap;
 private int rowGroupIndex;
 private String root;
+private long rowCount;
 
 @JsonCreator
 public RowGroupInfo(@JsonProperty("path") String path, 
@JsonProperty("start") long start,
-@JsonProperty("length") long length, 
@JsonProperty("rowGroupIndex") int rowGroupIndex) {
+@JsonProperty("length") long length, 
@JsonProperty("rowGroupIndex") int rowGroupIndex, long rowCount) {
--- End diff --

Can you add a comment that rowCount =-1 means to include all rows? 
Otherwise, LGTM +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4382: Remove dependency on drill-logical...

2016-02-10 Thread StevenMPhillips
GitHub user StevenMPhillips opened a pull request:

https://github.com/apache/drill/pull/373

DRILL-4382: Remove dependency on drill-logical from vector package



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StevenMPhillips/drill arrow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #373


commit e2be7853aa49467c6db0ab960fe2b11a24ccb84b
Author: Steven Phillips 
Date:   2016-02-05T01:43:17Z

DRILL-4382: Remove dependency on drill-logical from vector package




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52564032
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
--- End diff --

Sounds good to me. Will add that API to GroupScan. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52565738
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -485,12 +486,14 @@ public void populatePruningVector(ValueVector v, int 
index, SchemaPath column, S
 private EndpointByteMap byteMap;
 private int rowGroupIndex;
 private String root;
+private long rowCount;
 
 @JsonCreator
 public RowGroupInfo(@JsonProperty("path") String path, 
@JsonProperty("start") long start,
-@JsonProperty("length") long length, 
@JsonProperty("rowGroupIndex") int rowGroupIndex) {
+@JsonProperty("length") long length, 
@JsonProperty("rowGroupIndex") int rowGroupIndex, long rowCount) {
--- End diff --

Add the comment.

Pass -1 as rowCount in an unused method 
TestAffinityCalculator.buildRowGroups(). That is just to make code compile.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4382: Remove dependency on drill-logical...

2016-02-10 Thread StevenMPhillips
Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/373#issuecomment-182692303
  
There are two main changes made in this commit:

1. Removed SchemaPath from MaterializedField. Now, a MaterializedField 
contains a name (String), and a list of children (MaterializedField). Each 
MaterializedField instance knows its own name, but has no knowledge of its 
parents. While making this change, I also got rid of MaterializedField.Key, and 
made sure that MaterializedField was not used as a Map key anywhere in the code.

2. TransferPair no longer takes a FieldReference, but instead will take a 
String for the field name.

With those two changes, I was able to remove the dependency on 
drill-logical.

The rest of the changes in the patch are simply making the rest of the code 
conform to this new interface.

I should not that this will break external StoragePlugins. They will need 
to be modified and recompiled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52563417
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection) 
throws IOException {
   }
 
   @Override
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

We do see small Parquet files (typically Hive generated via partitioning on 
date), although probably bigger than the typical LIMIT value.  I am ok with not 
doing this optimization for now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52563924
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection) 
throws IOException {
   }
 
   @Override
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

I gave some thoughts about this optimization as well. Then, I realized that 
until we have some performance measurement, it's not very clear which way we 
want to have. For example, I'm not clear whether 1000 small parquet files is 
better than 1 large parquet files. 1000 files might have big metadata overhead 
than 1 large file (?). But 1000 small files might be better option, in case we 
do want to parallelize the execution.

I'll add some comment saying further optimization could be done in terms of 
how subset of files are chosen.
  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52563951
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScan.java
 ---
@@ -128,4 +128,12 @@ public int getOperatorType() {
   public List getPartitionColumns() {
 return Lists.newArrayList();
   }
+
+  /**
+   * By default, return null to indicate rowcount based prune is not 
supported. Each groupscan subclass should override, if it supports rowcount 
based prune.
+   */
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Unresolved dependencies

2016-02-10 Thread Vitalii Diravka
I saw unresolved dependencies in
/drill/contrib/storage-hive/core/src/test/java/org/apache/drill/exec/
in the latest version github.com/apache/drill.
Which library must be added to pom.xml?


[GitHub] drill pull request: DRILL-4020: The not-equal operator returns inc...

2016-02-10 Thread nagix
Github user nagix commented on the pull request:

https://github.com/apache/drill/pull/309#issuecomment-182678346
  
Can anyone merge this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52560352
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection) 
throws IOException {
   }
 
   @Override
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

I was thinking about that as well. Theoretically, it would be best to do a 
sort on record count and then binary search to the row group that has the 
closest number greater than the requested amount (too small means multiple 
files, larger files require more metadata reading/parsing. However, it kind of 
seems like premature optimization to me. Are you seeing lots of people with 
many small Parquet files? That generally seems counter to the Parquet design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4382: Remove dependency on drill-logical...

2016-02-10 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/373#issuecomment-182681649
  
@StevenMPhillips, can you please provide a narrative overview of the nature 
of these changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4382) Remove dependency on drill-logical from vector submodule

2016-02-10 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-4382:
--

 Summary: Remove dependency on drill-logical from vector submodule
 Key: DRILL-4382
 URL: https://issues.apache.org/jira/browse/DRILL-4382
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


This is in preparation for transitioning the code to the Apache Arrow project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4275: Refactor e/pstore interfaces and t...

2016-02-10 Thread hnfgns
Github user hnfgns closed the pull request at:

https://github.com/apache/drill/pull/325


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52557262
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScan.java
 ---
@@ -128,4 +128,12 @@ public int getOperatorType() {
   public List getPartitionColumns() {
 return Lists.newArrayList();
   }
+
+  /**
+   * By default, return null to indicate rowcount based prune is not 
supported. Each groupscan subclass should override, if it supports rowcount 
based prune.
+   */
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

Add Override and JsonIgnore ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52557443
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.partition.PruneScanRule;
+import org.apache.drill.exec.store.parquet.ParquetGroupScan;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+public abstract class DrillPushLimitToScanRule extends RelOptRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillPushLimitToScanRule.class);
+
+  private DrillPushLimitToScanRule(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static DrillPushLimitToScanRule LIMIT_ON_SCAN = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillScanRel.class)), "DrillPushLimitToScanRule_LimitOnScan") {
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scanRel = call.rel(1);
+  return scanRel.getGroupScan() instanceof ParquetGroupScan; // It 
only applies to Parquet.
--- End diff --

It should not be necessary to check instanceof ParquetGroupScan since the 
rule is not actually casting to ParquetGroupScan.  Better to have another API  
such as GroupScan.supportsLimitPushdown() and override it in ParquetGroupScan.. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4275: create TransientStore for short-li...

2016-02-10 Thread hnfgns
GitHub user hnfgns opened a pull request:

https://github.com/apache/drill/pull/374

DRILL-4275: create TransientStore for short-lived objects; refactor 
PersistentStore to introduce pagination mechanism

ps: removed PR#395 mistakenly so starting over.

collections/
introducing immutable entry

coord/ClusterCoordinator
add a factory method to create transient store

coord/store
introduce transient store and other classes around: factory, config, 
event, event type
introduce base transient store implementation

coord/zk
introducing path utils for zk
introducing general purpose zk client, unit tested
complete rewrite of ZkPersistentStore
complete rewrite of ZkEphemeralStore, unit tested
introducing event dispatcher used by ZkEphemeralStore -- externalized 
for unit testing, unit tested

coord/local/MapBackedStore
introduces a local, map backed transient store

coord/*
updates to adapt new subclasses

serialization/ (both transient & persistent store uses this package)
introducing instance serializer
introducing two concrete implementations: proto and jackson serializers

all of PersistentStore subclasses
implements new pagination logic

java-exec/pom.xml
adds curator-test dependency for unit tests

server/
update so that transient store is acquired, properly closed.

*/
misc renamings to reflect class name changes, to remove unneeded import
misc unit test fixes
misc minor clean-ups

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hnfgns/incubator-drill DRILL-4275

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #374


commit e077a2d6ba59a6abfe526bd8f38259d3959be5a7
Author: Hanifi Gunes 
Date:   2016-01-15T01:06:21Z

DRILL-4275: create TransientStore for short-lived objects; refactor 
PersistentStore to introduce pagination mechanism




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

2016-02-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/371#discussion_r52558347
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection) 
throws IOException {
   }
 
   @Override
+  public GroupScan applyLimit(long maxRecords) {
--- End diff --

Suppose the query has LIMIT 1000 and the first set of row groups considered 
by this rule have a small row count, then in the worst case there could be 1000 
files, whereas a single larger file may be sufficient to meet the limit.  Is 
that something we should consider here ?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---