[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687335279


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2239/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [WIP] Improve partition purning performance in presto carbon integration

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-687331617


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3979/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687324981


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2238/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687324192


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3978/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3913: [WIP] Improve partition purining perfromance in presto carbon integration

2020-09-04 Thread GitBox


ajantha-bhat opened a new pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913


### Why is this PR needed?
   a) For 200K segments table in cloud, presto partition query was taking more 
than 5 hours. the reason is it was reading all segment files for partition 
pruning. Now it is less than a minute !
   
### What changes were proposed in this PR?
   a)  HiveTableHandle already have partition spec, matching for the filters 
(it has queried metastore to get all partitions and pruned it). So, create 
partitionSpec based on that. Also handled for both prestodb and prestosql  
   b)  #3885 , broke prestodb compilation, only prestosql is compiled. 
   c)  #3887, also didn't handled prestodb
   
   
### Does this PR introduce any user interface change?
- No
   
   
### Is any new testcase added?
- No [Need to add spark support and create better UT for presto, TODO]
   verified manually
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687270749


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3977/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-04 Thread GitBox


ajantha-bhat commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687269685


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687269650


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2237/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687265603


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2236/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-687265069


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3976/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687140085


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3975/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-687139428


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2235/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] maheshrajus opened a new pull request #3912: [WIP] Global sort partitions should be determined dynamically

2020-09-04 Thread GitBox


maheshrajus opened a new pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912


### Why is this PR needed?

[WIP] Global sort partitions should be determined dynamically
### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-687014802


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2234/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-04 Thread GitBox


akashrn5 commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r483475663



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -2105,7 +2105,7 @@ public int getMaxSIRepairLimit(String dbName, String 
tableName) {
 // Check if user has enabled/disabled the use of property for the current 
db and table using
 // the set command
 String thresholdValue = getSessionPropertyValue(
-CarbonCommonConstants.CARBON_LOAD_SI_REPAIR + "." + dbName + "." + 
tableName);

Review comment:
   this changed by mistake i think





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687009894


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2233/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-687009718


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3974/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-687006962


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3973/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-04 Thread GitBox


QiangCai commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r483458848



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   please check query flow. it also use incremental loading index.
   here maybe can reuse the segment info cache.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [wip]Fix update and delete issue when multiple partition columns are present

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-686985522


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3972/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [wip]Fix update and delete issue when multiple partition columns are present

2020-09-04 Thread GitBox


CarbonDataQA1 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-686984791


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2232/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-04 Thread GitBox


kunal642 commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r483445661



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   ok got it
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-04 Thread GitBox


QiangCai commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r483443382



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   "readPartitions" will read all .segment files after loading. better to 
load the new .segment file only.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-04 Thread GitBox


kunal642 commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r483419204



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
##
@@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation(
   projects: Seq[NamedExpression],
   filters: Array[Filter],
   partitions: Seq[PartitionSpec]): RDD[InternalRow] = {
-val filterExpression: Option[Expression] = filters.flatMap { filter =>
+val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, 
carbonTable)).sortBy(_._2)
+val filterExpression: Option[Expression] = reorderedFilter.flatMap { tuple 
=>

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-04 Thread GitBox


kunal642 commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r483419324



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
##
@@ -72,7 +72,9 @@ case class CarbonDatasourceHadoopRelation(
   projects: Seq[NamedExpression],
   filters: Array[Filter],
   partitions: Seq[PartitionSpec]): RDD[InternalRow] = {
-val filterExpression: Option[Expression] = filters.flatMap { filter =>
+val reorderedFilter = filters.map(CarbonFilters.reorderFilter(_, 
carbonTable)).sortBy(_._2)

Review comment:
   handled in a little different way, please check again

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
##
@@ -373,6 +375,148 @@ object CarbonFilters {
 val carbonTable = CarbonEnv.getCarbonTable(identifier)(sparkSession)
 getPartitions(partitionFilters, sparkSession, carbonTable)
   }
+
+  def getStorageOrdinal(filter: Filter, carbonTable: CarbonTable): Int = {
+val column = filter.references.map(carbonTable.getColumnByName)
+if (column.isEmpty) {
+  -1
+} else {
+  if (column.head.isDimension) {
+column.head.getOrdinal
+  } else {
+column.head.getOrdinal + carbonTable.getAllDimensions.size()
+  }
+}
+  }
+
+  def collectSimilarExpressions(filter: Filter, table: CarbonTable): 
Seq[(Filter, Int)] = {
+filter match {
+  case sources.And(left, right) =>
+collectSimilarExpressions(left, table) ++ 
collectSimilarExpressions(right, table)
+  case sources.Or(left, right) => collectSimilarExpressions(left, table) ++
+  collectSimilarExpressions(right, table)
+  case others => Seq((others, getStorageOrdinal(others, table)))
+}
+  }
+
+  /**
+   * This method will reorder the filter based on the Storage Ordinal of the 
column references.
+   *
+   * Example1:
+   * And   And
+   *  Or  And =>OrAnd
+   *  col3  col1  col2  col1col1  col3col1   col2
+   *
+   *  **Mixed expression filter reordered locally, but wont be reordered 
globally.**
+   *
+   * Example2:
+   * And   And
+   *  And  And   =>   AndAnd
+   *  col3  col1  col2  col1col1  col1col2   col3
+   *
+   * OrOr
+   *   Or  Or =>OrOr
+   *   col3  col1  col2  col1   col1  col1col2   col3
+   *
+   *  **Similar expression filters are reordered globally**
+   *
+   * @param filter the filter expression to be reordered
+   * @return The reordered filter with the current ordinal
+   */
+  def reorderFilter(filter: Filter, table: CarbonTable): (Filter, Int) = {
+val filterMap = mutable.HashMap[String, List[(Filter, Int)]]()
+// If the filter size is one or the user has disabled reordering then no 
need to reorder.
+if (filter.references.toSet.size == 1 ||
+!CarbonProperties.isFilterReorderingEnabled) {

Review comment:
   agree, good catch





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-04 Thread GitBox


kunal642 commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r483419069



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   Do you mean caching of partitions after load?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-04 Thread GitBox


Indhumathi27 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-686944027


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3971) Session level dynamic properties for repair(carbon.load.si.repair and carbon.si.repair.limit) are not updated in https://github.com/apache/carbondata/blob/master/doc

2020-09-04 Thread Chetan Bhat (Jira)
Chetan Bhat created CARBONDATA-3971:
---

 Summary: Session level dynamic properties for 
repair(carbon.load.si.repair and carbon.si.repair.limit) are not updated in 
https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md
 Key: CARBONDATA-3971
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3971
 Project: CarbonData
  Issue Type: Bug
  Components: docs
Affects Versions: 2.1.0
Reporter: Chetan Bhat


Session level dynamic properties for repair(carbon.load.si.repair and 
carbon.si.repair.limit) are not mentioned inĀ  github link - 
https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)