[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-19 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075875



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+sql("drop table if exists complex1")
+sql("create table complex1 (arr array) stored as carbondata")
+sql("insert into complex1 select array('as') union all " +
+"select array('sd','df','gh') union all " +
+"select array('rt','ew','rtyu','jk',null) union all " +
+"select array('ghsf','dbv','','ty') union all " +
+"select array('hjsd','fggb','nhj','sd','asd')")
+
+checkExistence(sql(" explain select * from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkExistence(sql(" explain select count(*) from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
   This PR is only for UDF pushdown





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-19 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457075708



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+sql("drop table if exists complex1")
+sql("create table complex1 (arr array) stored as carbondata")
+sql("insert into complex1 select array('as') union all " +
+"select array('sd','df','gh') union all " +
+"select array('rt','ew','rtyu','jk',null) union all " +
+"select array('ghsf','dbv','','ty') union all " +
+"select array('hjsd','fggb','nhj','sd','asd')")
+
+checkExistence(sql(" explain select * from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkExistence(sql(" explain select count(*) from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
   Currently carbon doesn't support pushdown of arr[0] = 'sd', because this 
pushdown is based on array index. 
   Need a separate handling for this. yet to analyze the changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-19 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r457070775



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 Some(CarbonContainsWith(c))
   case c@Literal(v, t) if (v == null) =>
 Some(FalseExpr())
-  case others => None
+  case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+a.dataType match {
+  case arrayType: ArrayType =>
+arrayType.elementType match {

Review comment:
   ok. moved





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-30 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r447457038



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 Some(CarbonContainsWith(c))
   case c@Literal(v, t) if (v == null) =>
 Some(FalseExpr())
+  case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+a.dataType match {
+  case arrayType: ArrayType =>
+arrayType.elementType match {
+  case StringType => Some(sources.EqualTo(a.name, v))

Review comment:
   I want reuse existing equalsTo code, I don't see any advantage of making 
new expression

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##
@@ -152,13 +152,25 @@ object CarbonFilters {
 }
 
 def getCarbonExpression(name: String) = {

Review comment:
   I want reuse existing equalsTo code, I don't see any advantage of making 
new expression





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-25 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r445361657



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   }
 }
 BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-for (int i = 0; i < pageNumbers; i++) {
-  BitSet set = new BitSet(numberOfRows[i]);
-  RowIntf row = new RowImpl();
-  BitSet prvBitset = null;
-  // if bitset pipe line is enabled then use rowid from previous bitset
-  // otherwise use older flow
-  if (!useBitsetPipeLine ||
-  null == rawBlockletColumnChunks.getBitSetGroup() ||
-  null == bitSetGroup.getBitSet(i) ||
-  rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+if (isDimensionPresentInCurrentBlock.length == 1 && 
isDimensionPresentInCurrentBlock[0]

Review comment:
   I think using equalTo expression I can reuse most of the code. what do 
you think ? @QiangCai 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-24 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r44482



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -679,18 +681,20 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 // In case of ComplexType dataTypes no filters should be pushed down. 
IsNotNull is being
 // explicitly added by spark and pushed. That also has to be handled and 
pushed back to
 // Spark for handling.
-val predicatesWithoutComplex = predicates.filter(predicate =>
+// allow array_contains() push down
+val filteredPredicates = predicates.filter(predicate =>

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-24 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r444824720



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   }
 }
 BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-for (int i = 0; i < pageNumbers; i++) {
-  BitSet set = new BitSet(numberOfRows[i]);
-  RowIntf row = new RowImpl();
-  BitSet prvBitset = null;
-  // if bitset pipe line is enabled then use rowid from previous bitset
-  // otherwise use older flow
-  if (!useBitsetPipeLine ||
-  null == rawBlockletColumnChunks.getBitSetGroup() ||
-  null == bitSetGroup.getBitSet(i) ||
-  rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+if (isDimensionPresentInCurrentBlock.length == 1 && 
isDimensionPresentInCurrentBlock[0]

Review comment:
   @QiangCai : can you please tell me, why new expression is required ? why 
equalTo is not enough ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-06-09 Thread GitBox


ajantha-bhat commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r437867873



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -517,7 +518,8 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
   val supportBatch =
 supportBatchedDataSource(relation.relation.sqlContext,
   updateRequestedColumns) && extraRdd.getOrElse((null, true))._2
-  if (!vectorPushRowFilters && !supportBatch && !implicitExisted) {
+  if (!vectorPushRowFilters && !supportBatch && !implicitExisted && 
filterSet.nonEmpty &&

Review comment:
   This is for count(*) with array_contains() query.  Here they were 
reverting back the array_contains(). so avoided it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org