[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456176045



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   }
 }
 BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-for (int i = 0; i < pageNumbers; i++) {
-  BitSet set = new BitSet(numberOfRows[i]);
-  RowIntf row = new RowImpl();
-  BitSet prvBitset = null;
-  // if bitset pipe line is enabled then use rowid from previous bitset
-  // otherwise use older flow
-  if (!useBitsetPipeLine ||
-  null == rawBlockletColumnChunks.getBitSetGroup() ||
-  null == bitSetGroup.getBitSet(i) ||
-  rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+if (isDimensionPresentInCurrentBlock.length == 1 && 
isDimensionPresentInCurrentBlock[0]

Review comment:
   it will be hard to read the code after we add more if condition





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456175101



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 Some(CarbonContainsWith(c))
   case c@Literal(v, t) if (v == null) =>
 Some(FalseExpr())
-  case others => None
+  case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+a.dataType match {
+  case arrayType: ArrayType =>
+arrayType.elementType match {

Review comment:
   how about extract the match code block to a method: isPrimitiveDataType 
and move it into a util class?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456167613



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+sql("drop table if exists complex1")
+sql("create table complex1 (arr array) stored as carbondata")
+sql("insert into complex1 select array('as') union all " +
+"select array('sd','df','gh') union all " +
+"select array('rt','ew','rtyu','jk',null) union all " +
+"select array('ghsf','dbv','','ty') union all " +
+"select array('hjsd','fggb','nhj','sd','asd')")
+
+checkExistence(sql(" explain select * from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkExistence(sql(" explain select count(*) from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
   can you add a test case that likes the below query?
   
   select * from complex1 where arr[0] = 'sd'
   
   can we push down this filter too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-09 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437812719



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   }
 }
 BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-for (int i = 0; i < pageNumbers; i++) {
-  BitSet set = new BitSet(numberOfRows[i]);
-  RowIntf row = new RowImpl();
-  BitSet prvBitset = null;
-  // if bitset pipe line is enabled then use rowid from previous bitset
-  // otherwise use older flow
-  if (!useBitsetPipeLine ||
-  null == rawBlockletColumnChunks.getBitSetGroup() ||
-  null == bitSetGroup.getBitSet(i) ||
-  rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+if (isDimensionPresentInCurrentBlock.length == 1 && 
isDimensionPresentInCurrentBlock[0]

Review comment:
   1.  better to add new Expression like ArrayContainsExpression
   2.  how about to consider filter BitSetPipeLine ?

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 // In case of ComplexType dataTypes no filters should be pushed down. 
IsNotNull is being
 // explicitly added by spark and pushed. That also has to be handled and 
pushed back to
 // Spark for handling.
-val predicatesWithoutComplex = predicates.filter(predicate =>
+// allow array_contains() push down
+val filteredPredicates = predicates.filter(predicate =>

Review comment:
   use '{' instead of '('

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
   val supportBatch =
 supportBatchedDataSource(relation.relation.sqlContext,
   updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
-  if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+  if (!vectorPushRowFilters && !supportBatch && !implicitExisted && 
filterSet.nonEmpty &&

Review comment:
   why need to change it?

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##
@@ -152,13 +152,25 @@ object CarbonFilters {
 }
 
 def getCarbonExpression(name: String) = {

Review comment:
   in 'createFilter' method,  convert CarbonArrayContains filter to 
ArrayContainsExpression

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 Some(CarbonContainsWith(c))
   case c@Literal(v, t) if (v == null) =>
 Some(FalseExpr())
+  case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+a.dataType match {
+  case arrayType: ArrayType =>
+arrayType.elementType match {
+  case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
   how about to use a new filter: CarbonArrayContains





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org