[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Include bloom...

2019-01-10 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r246663278
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -184,6 +184,9 @@ object CarbonEnv {
   .addListener(classOf[LoadTablePostExecutionEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableCompactionPostEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableMergeIndexEvent], new 
MergeIndexEventListener)
+  .addListener(classOf[LoadTablePreStatusUpdateEvent], new 
MergeBloomIndexEventListener)
--- End diff --

@jackylk @ravipesala @KanakaKumar Please check and review


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246376652
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +609,14 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  if (blockInfos.size() > 0) {
--- End diff --

could be set to some value if use NODE_MIN_SIZE_FIRST but fall back to 
BLOCK_NUM_FIRST


---


[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-07 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r245902021
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala
 ---
@@ -30,13 +30,20 @@ object ExampleUtils {
   .getCanonicalPath
   val storeLocation: String = currentPath + "/target/store"
 
-  def createCarbonSession(appName: String, workThreadNum: Int = 1): 
SparkSession = {
+  def createCarbonSession (appName: String, workThreadNum: Int = 1,
+  storePath: String = null): SparkSession = {
 val rootPath = new File(this.getClass.getResource("/").getPath
-+ "../../../..").getCanonicalPath
-val storeLocation = s"$rootPath/examples/spark2/target/store"
+  + "../../../..").getCanonicalPath
+
 val warehouse = s"$rootPath/examples/spark2/target/warehouse"
 val metaStoreDB = s"$rootPath/examples/spark2/target"
 
+val storeLocation = if (null != storePath) {
+  storePath;
--- End diff --

no need for `;`


---


[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

2019-01-07 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r245641130
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/util/ExampleUtils.scala
 ---
@@ -30,13 +30,17 @@ object ExampleUtils {
   .getCanonicalPath
   val storeLocation: String = currentPath + "/target/store"
 
-  def createCarbonSession(appName: String, workThreadNum: Int = 1): 
SparkSession = {
+  def createCarbonSession (appName: String, workThreadNum: Int = 1,
+  storePath: String = null): SparkSession = {
 val rootPath = new File(this.getClass.getResource("/").getPath
-+ "../../../..").getCanonicalPath
-val storeLocation = s"$rootPath/examples/spark2/target/store"
+  + "../../../..").getCanonicalPath
+var storeLocation = s"$rootPath/examples/spark2/target/store"
 val warehouse = s"$rootPath/examples/spark2/target/warehouse"
 val metaStoreDB = s"$rootPath/examples/spark2/target"
 
+if (storePath != null) {
+  storeLocation = storePath;
+}
--- End diff --

```suggestion
val storeLocation = if (null != storePath) {
  storePath
} else {
  s"$rootPath/examples/spark2/target/store"
}
```


---


[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

2019-01-07 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r245639998
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala
 ---
@@ -26,48 +30,88 @@ import org.apache.carbondata.examples.util.ExampleUtils
 
 
 /**
- * configure alluxio:
- * 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
- */
+  * configure alluxio:
+  * 1.start alluxio
+  * 2.upload the jar: "/alluxio_path/core/client/target/
+  * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
+  * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+  */
 
 object AlluxioExample {
-  def main(args: Array[String]) {
-val spark = ExampleUtils.createCarbonSession("AlluxioExample")
-exampleBody(spark)
-spark.close()
+  def main (args: Array[String]) {
+val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+  storePath = "alluxio://localhost:19998/carbondata")
+exampleBody(carbon)
+carbon.close()
   }
 
-  def exampleBody(spark : SparkSession): Unit = {
+  def exampleBody (spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
 spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
 FileFactory.getConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
 
 // Specify date format based on raw data
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd")
 
-spark.sql("DROP TABLE IF EXISTS alluxio_table")
+val mFsShell = new FileSystemShell()
+val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
+val remotePath = "/carbon_alluxio.csv"
+val remoteFile = "alluxio://localhost:19998/carbon_alluxio.csv"
+mFsShell.run("rm", remotePath)
--- End diff --

As an example, I think we should not do this operation


---


[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Optimize carbonData using a...

2019-01-07 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r245639533
  
--- Diff: docs/Integration/alluxio-guide.md ---
@@ -0,0 +1,44 @@
+
+
+
+# Presto guide
+This tutorial provides a quick introduction to using Alluxio.
+
+## How to use Alluxio for CarbonData?
+### Install and start Alluxio
+Please refer to 
[https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio](https://www.alluxio.org/docs/1.8/en/Getting-Started.html#starting-alluxio)
   
+Access the Alluxio web: 
[http://localhost:1/home](http://localhost:1/home)   
+By command, for example: 
+```$xslt
+./bin/alluxio fs ls /
+```
+Result:
+```
+drwxr-xr-x xubo   staff1   NOT_PERSISTED 
01-07-2019 15:39:24:960  DIR /carbondata
+-rw-r--r-- xubo   staff50686   NOT_PERSISTED 
01-07-2019 11:37:48:924 100% /data.csv
+```
+### Upload Alluxio jar to CarbonData
+Upload the jar "/alluxio_path/client/alluxio-YOUR-VERSION-client.jar" to 
CarbonData
--- End diff --

"Upload to CarbonData" is confusing. What we need to do is to add the 
alluxio client jar to classpath, right? 


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245511058
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dm properties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

The codes is not consistent with comments in line 113. Please fix it 



---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245511008
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dm properties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

So user do not have to config dmproperties for long string anymore and this 
line is no use after this PR? 


---


[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Include bloom...

2019-01-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245480571
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,96 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+LOGGER.info("LoadTablePreStatusUpdateEvent called for bloom index 
merging")
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+LOGGER.info("AlterTableCompactionPreStatusUpdateEvent called for 
bloom index merging")
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
--- End diff --

The codes are following similar steps. For L44~L46, it depends on what info 
we can get from the Event. For L49~L52, do you want to say we can extract a 
common method get datamap of a table with additional constrains?


---


[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2019-01-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245480089
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,96 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+LOGGER.info("LoadTablePreStatusUpdateEvent called for bloom index 
merging")
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+LOGGER.info("AlterTableCompactionPreStatusUpdateEvent called for 
bloom index merging")
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+LOGGER.info("BuildDataMapPostExecutionEvent called for bloom index 
merging")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction for bloom 
index merging
+  // they use LoadTablePreStatusUpdateEvent and 
AlterTableCompactionPreStatusUpdateEvent
+  LOGGER.info("Ignore BuildDataMapPostExecutionEvent from loading 
and compaction")
+  return
+}
+
 val carbonTableIdentifier = datamapPostEvent.identifier
 val carbonTable = 
DataMapStoreManager.getInstance().getCarbonTable(carbonTableIdentifier)
-val tableDataMaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable)
-val sparkSession = SparkSession.getActiveSession.get
 
-// filter out bloom datamap
-var bloomDatamaps = tableDataMaps.asScala.filter(
-  _.getDataMapSchema.getProviderName.equalsIgnoreCase(
+// filter out current rebuilt bloom datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
 DataMapClassProvider.BLOOMFILTER.getShortName))
-
-if (datamapPostEvent.isFromRe

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2019-01-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245480061
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -184,6 +184,9 @@ object CarbonEnv {
   .addListener(classOf[LoadTablePostExecutionEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableCompactionPostEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableMergeIndexEvent], new 
MergeIndexEventListener)
+  .addListener(classOf[LoadTablePreStatusUpdateEvent], new 
MergeBloomIndexEventListener)
--- End diff --

Yes. This is what this PR wants to do. 


---


[GitHub] carbondata issue #2973: [WIP][CARBONDATA-3144] CarbonData support spark-2.4....

2019-01-04 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2973
  
I test this PR by running `CarbonSessionExample`. Please check following 
exception:
```
java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot be 
cast to org.apache.spark.sql.hive.HiveExternalCatalog
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.org$apache$spark$sql$hive$CarbonSessionStateBuilder$$externalCatalog(CarbonSessionState.scala:232)
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.resourceLoader$lzycompute(CarbonSessionState.scala:238)
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.resourceLoader(CarbonSessionState.scala:237)
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog$lzycompute(CarbonSessionState.scala:226)
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:217)
at 
org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:196)
at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291)
at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291)
at 
org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:77)
at 
org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:77)
at org.apache.spark.sql.CarbonEnv$.getInstance(CarbonEnv.scala:134)
at 
org.apache.spark.sql.CarbonSession$.updateSessionInfoToCurrentThread(CarbonSession.scala:325)
at 
org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:48)
at 
org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:125)
at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:88)
```


---


[GitHub] carbondata pull request #2996: [WIP] Fix Rename-Fail & Datamap-creation-Fail

2019-01-02 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2996#discussion_r244906592
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
 ---
@@ -165,15 +167,22 @@ private[sql] case class CarbonAlterTableRenameCommand(
   case e: ConcurrentOperationException =>
 throw e
   case e: Exception =>
+if (hiveRenameSuccess) {
+  
sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].alterTableRename(
--- End diff --

This is a revert operation when L139-142(hive update) is success but 
L145-146(carbon meta store update) is failed


---


[GitHub] carbondata pull request #2996: [WIP] Fix Rename-Fail & Datamap-creation-Fail

2019-01-02 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2996#discussion_r244906489
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
 ---
@@ -165,15 +167,22 @@ private[sql] case class CarbonAlterTableRenameCommand(
   case e: ConcurrentOperationException =>
 throw e
   case e: Exception =>
+if (hiveRenameSuccess) {
+  
sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].alterTableRename(
+TableIdentifier(newTableName, Some(oldDatabaseName)),
+TableIdentifier(oldTableName, Some(oldDatabaseName)),
--- End diff --

Can we reuse `oldTableIdentifier` and `newTableIdentifier` in 
`alterTableRenameModel` ? 


---


[GitHub] carbondata pull request #3017: [HOTFIX] remove this useless assignment

2018-12-26 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3017#discussion_r244096878
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/LocalFileLock.java ---
@@ -112,7 +112,7 @@ public LocalFileLock(String lockFileLocation, String 
lockFile) {
 status = true;
   }
 } catch (IOException e) {
-  status = false;
+  // status = false;
--- End diff --

Please review PR #2878  at your convenience. Changes follows rules in the 
table in description


---


[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2018-12-25 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r243928456
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,88 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction
+  return
+}
+
 val carbonTableIdentifier = datamapPostEvent.identifier
 val carbonTable = 
DataMapStoreManager.getInstance().getCarbonTable(carbonTableIdentifier)
-val tableDataMaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable)
-val sparkSession = SparkSession.getActiveSession.get
 
-// filter out bloom datamap
-var bloomDatamaps = tableDataMaps.asScala.filter(
-  _.getDataMapSchema.getProviderName.equalsIgnoreCase(
+// filter out current rebuilt bloom datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
 DataMapClassProvider.BLOOMFILTER.getShortName))
-
-if (datamapPostEvent.isFromRebuild) {
-  if (null != datamapPostEvent.dmName) {
-// for rebuild process
-bloomDatamaps = bloomDatamaps.filter(
-  
_.getDataMapSchema.getDataMapName.equalsIgnoreCase(datamapPostEvent.dmName))
-  }
-} else {
-  // for load process, skip lazy datamap
-  bloomDatamaps = bloomDatamaps.filter(!_.getDataMapSchema.isLazy)
-}
+  
.filter(_.getDataMapSchema.getDataMapName.equalsIgnoreCase(datamapPostEvent.dmName))
+  .toList
   

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2018-12-25 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r243928449
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,88 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction
--- End diff --

Added


---


[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2018-12-25 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r243905862
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,88 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction
--- End diff --

what information do you expect? 
ignore datamapPostEvent event from loading and compaction because the bloom 
index files is already merged by above event. 


---


[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2018-12-25 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r243905231
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,88 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction
+  return
+}
+
 val carbonTableIdentifier = datamapPostEvent.identifier
 val carbonTable = 
DataMapStoreManager.getInstance().getCarbonTable(carbonTableIdentifier)
-val tableDataMaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable)
-val sparkSession = SparkSession.getActiveSession.get
 
-// filter out bloom datamap
-var bloomDatamaps = tableDataMaps.asScala.filter(
-  _.getDataMapSchema.getProviderName.equalsIgnoreCase(
+// filter out current rebuilt bloom datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
 DataMapClassProvider.BLOOMFILTER.getShortName))
-
-if (datamapPostEvent.isFromRebuild) {
-  if (null != datamapPostEvent.dmName) {
-// for rebuild process
-bloomDatamaps = bloomDatamaps.filter(
-  
_.getDataMapSchema.getDataMapName.equalsIgnoreCase(datamapPostEvent.dmName))
-  }
-} else {
-  // for load process, skip lazy datamap
-  bloomDatamaps = bloomDatamaps.filter(!_.getDataMapSchema.isLazy)
--- End diff --

original implementation deal all three scenes(load/compact/rebuild), so use 
if-else to distinguish. Now, thre

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2018-12-25 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/3023

[CARBONDATA-3197][BloomDataMap] Merge bloom index before accessible

**Problem**
Currently carbon allows to query when bloom index files are merging, but 
this will cause problems when the index files state change from multiple shards 
to merged shard.

Timeline to explain problem:
- load data for table with bloom datamap, data is loaded, bloom index files 
are generated along loading, bloom index file merging is under action
- query fired
- `BloomCoarseGrainDataMapFactory.getAllShardPaths` found multiple shards, 
and bloom index file merging in progress, so `BloomCoarseGrainDataMap` with 
detailed shard name created
- bloom index file merging done, folders with detailed shard name are 
deleted
- Exception will occur when `BloomCoarseGrainDataMap` wants to read bloom 
index file from  folders with detailed shard name to prune

**Analyse**
Root cause is that we allow query on datamap which is not in stable state. 
one solution is to disable datamap when merging bloom index file,  but this 
will affect all the segments many times. Another solution is to take the bloom 
index files merging as part of loading, such that query can not access unstable 
bloom index files until it is ready

**Solution**
Change the events to watch for `MergeBloomIndexEventListener`, do the 
merging staff before segment status is updated for access

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata mergeBloomIndexEvent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3023


commit 881a510de38e30cfa4ae8a84be1f003c6254d9ab
Author: Manhua 
Date:   2018-12-25T08:21:40Z

merge bloom index before accessible




---


[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...

2018-12-24 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2161#discussion_r243819959
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -960,7 +960,7 @@ private CarbonCommonConstants() {
* If set to GLOBAL_SORT, the sorting scope is bigger and one index tree 
per task will be
* created, thus loading is slower but query is faster.
*/
-  public static final String LOAD_SORT_SCOPE_DEFAULT = "LOCAL_SORT";
+  public static final String LOAD_SORT_SCOPE_DEFAULT = "NO_SORT";
--- End diff --

OK


---


[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...

2018-12-22 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2161#discussion_r243732085
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -960,7 +960,7 @@ private CarbonCommonConstants() {
* If set to GLOBAL_SORT, the sorting scope is bigger and one index tree 
per task will be
* created, thus loading is slower but query is faster.
*/
-  public static final String LOAD_SORT_SCOPE_DEFAULT = "LOCAL_SORT";
+  public static final String LOAD_SORT_SCOPE_DEFAULT = "NO_SORT";
--- End diff --

why should change this default value?

Some more advices:
1. blank changes in this file is unnecessary. 
2. too many commits in this PR, better to merge them into if history does 
not matter. refer to command `git rebase` to merge the commits into one

Hope this PR can be merged soon


---


[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...

2018-12-20 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2161
  
> Since user normally uses Alluxio as a read cache, I think we can firstly 
verify carbon on alluxio for the query scenario. As I am still not very sure 
what is the correct way to implement rename for Alluxio, in the meantime, we 
can merge this PR first. So please rebase it. @chandrasaripaka

@jackylk My first try wants to use alluxio for query only as you said. But 
hive stores table location and carbon blocks `alter location` command, I have 
to create table after carbon runs on alluxio. Please mind this problem.


---


[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...

2018-12-20 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2161#discussion_r243254949
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/AlluxioFileLock.java ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.locks;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.AlluxioCarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.log4j.Logger;
+
+/**
+ * This class is used to handle the S3 File locking.
+ * This is acheived using the concept of acquiring the data out stream 
using Append option.
+ */
+public class AlluxioFileLock extends AbstractCarbonLock {
+
+  /**
+   * LOGGER
+   */
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(AlluxioCarbonFile.class.getName());
+  /**
+   * lockFilePath is the location of the lock file.
+   */
+  private String lockFilePath;
+
+  /**
+   * lockFileDir is the directory of the lock file.
+   */
+  private String lockFileDir;
+
+  private DataOutputStream dataOutputStream;
+
+  /**
+   * @param tableIdentifier
+   * @param lockFile
+   */
+  public AlluxioFileLock(AbsoluteTableIdentifier tableIdentifier, String 
lockFile) {
+this(tableIdentifier.getTablePath(), lockFile);
+  }
+
+  /**
+   * @param lockFileLocation
+   * @param lockFile
+   */
+  public AlluxioFileLock(String lockFileLocation, String lockFile) {
+this.lockFileDir = 
CarbonTablePath.getLockFilesDirPath(lockFileLocation);
+this.lockFilePath = CarbonTablePath.getLockFilePath(lockFileLocation, 
lockFile);
+LOGGER.info("Alluxio lock path:" + this.lockFilePath);
+initRetry();
+  }
+
+  /* (non-Javadoc)
+   * @see org.apache.carbondata.core.locks.ICarbonLock#unlock()
+   */
+  @Override
+  public boolean unlock() {
+boolean status = false;
+if (null != dataOutputStream) {
+  try {
+dataOutputStream.close();
+status = true;
+  } catch (IOException e) {
+status = false;
+  }
+}
+return status;
+  }
+
+  /* (non-Javadoc)
+   * @see org.apache.carbondata.core.locks.ICarbonLock#lock()
+   */
+  @Override
+  public boolean lock() {
+try {
+  if (!FileFactory.isFileExist(lockFileDir)) {
+FileFactory.mkdirs(lockFileDir, 
FileFactory.getFileType(lockFileDir));
+  }
+  if (!FileFactory.isFileExist(lockFilePath)) {
+FileFactory.createNewLockFile(lockFilePath, 
FileFactory.getFileType(lockFilePath));
+  }
+  dataOutputStream =
+  FileFactory.getDataOutputStreamUsingAppend(lockFilePath,
+  FileFactory.getFileType(lockFilePath));
--- End diff --

@xubo245 "tablestatus.lock already exists" is caused by this


---


[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...

2018-12-19 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2161#discussion_r243168473
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/AlluxioFileLock.java ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.locks;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.filesystem.AlluxioCarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+import org.apache.log4j.Logger;
+
+/**
+ * This class is used to handle the S3 File locking.
+ * This is acheived using the concept of acquiring the data out stream 
using Append option.
+ */
+public class AlluxioFileLock extends AbstractCarbonLock {
+
+  /**
+   * LOGGER
+   */
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(AlluxioCarbonFile.class.getName());
+  /**
+   * lockFilePath is the location of the lock file.
+   */
+  private String lockFilePath;
+
+  /**
+   * lockFileDir is the directory of the lock file.
+   */
+  private String lockFileDir;
+
+  private DataOutputStream dataOutputStream;
+
+  /**
+   * @param tableIdentifier
+   * @param lockFile
+   */
+  public AlluxioFileLock(AbsoluteTableIdentifier tableIdentifier, String 
lockFile) {
+this(tableIdentifier.getTablePath(), lockFile);
+  }
+
+  /**
+   * @param lockFileLocation
+   * @param lockFile
+   */
+  public AlluxioFileLock(String lockFileLocation, String lockFile) {
+this.lockFileDir = 
CarbonTablePath.getLockFilesDirPath(lockFileLocation);
+this.lockFilePath = CarbonTablePath.getLockFilePath(lockFileLocation, 
lockFile);
+LOGGER.info("Alluxio lock path:" + this.lockFilePath);
+initRetry();
+  }
+
+  /* (non-Javadoc)
+   * @see org.apache.carbondata.core.locks.ICarbonLock#unlock()
+   */
+  @Override
+  public boolean unlock() {
+boolean status = false;
+if (null != dataOutputStream) {
+  try {
+dataOutputStream.close();
+status = true;
+  } catch (IOException e) {
+status = false;
+  }
+}
+return status;
+  }
+
+  /* (non-Javadoc)
+   * @see org.apache.carbondata.core.locks.ICarbonLock#lock()
+   */
+  @Override
+  public boolean lock() {
+try {
+  if (!FileFactory.isFileExist(lockFileDir)) {
+FileFactory.mkdirs(lockFileDir, 
FileFactory.getFileType(lockFileDir));
+  }
+  if (!FileFactory.isFileExist(lockFilePath)) {
+FileFactory.createNewLockFile(lockFilePath, 
FileFactory.getFileType(lockFilePath));
+  }
+  dataOutputStream =
+  FileFactory.getDataOutputStreamUsingAppend(lockFilePath,
+  FileFactory.getFileType(lockFilePath));
--- End diff --

Can you insert data into table? I found that the implementation of `append` 
function in Alluxio required to create a new file.  It will fail if file 
already exists. For line 100, carbon makes sure the file exists, and finally we 
always fail when locking.

The code I mention in alluxio is this: 

https://github.com/Alluxio/alluxio/blob/21a3a87747010f087d1937f48bba3caa883885f8/core/client/hdfs/src/main/java/alluxio/hadoop/AbstractFileSystem.java#L127-L138



---


[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...

2018-12-19 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2161#discussion_r243167033
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AlluxioCarbonFile.java
 ---
@@ -94,14 +93,9 @@ public CarbonFile getParentFile() {
   public boolean renameForce(String changeToName) {
 FileSystem fs;
 try {
-  fs = 
fileStatus.getPath().getFileSystem(FileFactory.getConfiguration());
-  if (fs instanceof DistributedFileSystem) {
-((DistributedFileSystem) fs).rename(fileStatus.getPath(), new 
Path(changeToName),
-org.apache.hadoop.fs.Options.Rename.OVERWRITE);
-return true;
-  } else {
-return false;
-  }
+  fs = fileStatus.getPath().getFileSystem(hadoopConf);
+  fs.delete(new Path(changeToName), true);
+  return fs.rename(fileStatus.getPath(), new Path(changeToName));
--- End diff --

This problem is also for `S3CarbonFile` , `ViewFSCarbonFile`, 
`LocalCarbonFile` 


---


[GitHub] carbondata pull request #3005: [CARBONDATA-3185] Fix alluxio file rename

2018-12-19 Thread kevinjmh
Github user kevinjmh closed the pull request at:

https://github.com/apache/carbondata/pull/3005


---


[GitHub] carbondata issue #3005: [CARBONDATA-3185] Fix alluxio file rename

2018-12-19 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/3005
  
> There is a PR to fix the same issue: #2161

I see. It fixed LockFile problem I faced now too. 
Hope it got fix soon. And I am closing this PR now


---


[GitHub] carbondata pull request #3005: [CARBONDATA-3185] Fix alluxio file rename

2018-12-19 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3005#discussion_r243150578
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AlluxioCarbonFile.java
 ---
@@ -95,15 +94,13 @@ public boolean renameForce(String changeToName) {
 FileSystem fs;
 try {
   fs = 
fileStatus.getPath().getFileSystem(FileFactory.getConfiguration());
-  if (fs instanceof DistributedFileSystem) {
-((DistributedFileSystem) fs).rename(fileStatus.getPath(), new 
Path(changeToName),
-org.apache.hadoop.fs.Options.Rename.OVERWRITE);
-return true;
-  } else {
-return false;
+  Path targetPath = new Path(changeToName);
+  if (fs.exists(targetPath)) {
+fs.delete(targetPath, true);
--- End diff --

This should be managed by developer. If developer wants a simple rename, he 
should call `renameTo` instead of `renameForce`. Codes in this PR implements 
the OVERWRITE(delete before rename),  as `S3CarbonFile` , `ViewFSCarbonFile`, 
`LocalCarbonFile` do


---


[GitHub] carbondata pull request #3005: [CARBONDATA-3185] Fix alluxio file rename

2018-12-19 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/3005

[CARBONDATA-3185] Fix alluxio file rename

**Problem**
Exception thrown when create table on alluxio because rename schema file 
failed

**Analyse**
Re-run the command after adding some logs, I found that the file system 
object it uses is `alluxio.hadoop.FileSystem`, and it is not an instanceof 
`DistributedFileSystem`, such that the renameForce method return false.

**Solution**
By checking hierarchy of class 
[`alluxio.hadoop.FileSystem`](https://github.com/Alluxio/alluxio/blob/branch-1.8/core/client/hdfs/src/main/java/alluxio/hadoop/FileSystem.java),
 it is ok to use method of hadoop FileSystem directly. We can fix it by remove 
instant type check and rewrite the overwrite file code by hand.
```
FileSystem (org.apache.hadoop.fs)
|__ AbstractFileSystem (alluxio.hadoop)
  |__FileSystem (alluxio.hadoop)
```



*Log*
```
18/12/20 09:31:50 ERROR thriftserver.SparkExecuteStatementOperation: Error 
executing query, currentState RUNNING,
org.apache.carbondata.spark.exception.ProcessMetaDataException: operation 
failed for default.alluxio: Create table'alluxio' in database 'default' failed, 
temporary file renaming failed, 
src=alluxio://localhost:19998/user/hive/warehouse/carbon.store/default/alluxio/Metadata/schema.write,
 
dest=alluxio://localhost:19998/user/hive/warehouse/carbon.store/default/alluxio/Metadata/schema
at 
org.apache.spark.sql.execution.command.MetadataProcessOpeation$class.throwMetadataException(package.scala:55)
at 
org.apache.spark.sql.execution.command.MetadataCommand.throwMetadataException(package.scala:120)
at 
org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:179)
at 
org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:122)
at 
org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:122)
at 
org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:104)
at 
org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:120)
at 
org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:122)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:183)
at 
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:91)
at 
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:90)
at 
org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:136)
at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:88)
```

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata alluxio_fs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3005.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3005


commit 78f5d72a5b169b38d7d6bd0f7fbd9ebf8bfa6d55
Author: Manhua 
Date:   2018-12-20T02:34:48Z

fix alluxio rename fail




---


[GitHub] carbondata pull request #3000: [CARBONDATA-3181][BloomDataMap] Fix access fi...

2018-12-19 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3000#discussion_r243130074
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java 
---
@@ -49,27 +49,23 @@ public CarbonBloomFilter(int vectorSize, int nbHash, 
int hashType, boolean compr
 
   @Override
   public boolean membershipTest(Key key) {
-if (key == null) {
-  throw new NullPointerException("key cannot be null");
-}
-
-int[] h = hash.hash(key);
-hash.clear();
 if (compress) {
   // If it is compressed check in roaring bitmap
+  if (key == null) {
--- End diff --

The reason I code it like this is for consistent procedure. And I don't 
think it can reduce call for GENERAL cases. Null is only one special case, and 
all the others are not-null cases. If change it as you advised, all non-null 
cases will have to do double null check when bloom filter is not compressed.


---


[GitHub] carbondata pull request #3000: [CARBONDATA-3181][BloomDataMap] Fix access fi...

2018-12-18 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/3000

[CARBONDATA-3181][BloomDataMap] Fix access field error for BitSet in bloom 
filter

**Problem**
java.lang.IllegalAccessError is thrown when query on bloom filter without 
compress on CarbonThriftServer. Detailed log ss pasted at bottom.

**Analyse**
similar problem was occur when get/set BitSet in CarbonBloomFilter, it uses 
reflection to solve. We can do it like this. Since we have set the BitSet 
already, another easier way is to call super class method to avoid accessing it 
from CarbonBloomFilter

**Solution**
if bloom filter is not compressed, call super method to test membership

---
*Detail log*
```
18/12/19 11:16:07 ERROR thriftserver.SparkExecuteStatementOperation: Error 
executing query, currentState RUNNING,
java.lang.IllegalAccessError: tried to access field 
org.apache.hadoop.util.bloom.BloomFilter.bits from class 
org.apache.hadoop.util.bloom.CarbonBloomFilter
at 
org.apache.hadoop.util.bloom.CarbonBloomFilter.membershipTest(CarbonBloomFilter.java:70)
at 
org.apache.carbondata.datamap.bloom.BloomCoarseGrainDataMap.prune(BloomCoarseGrainDataMap.java:202)
at 
org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:185)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:160)
at 
org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:517)
at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:412)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:529)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:220)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:127)
at 
org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
```

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata bloom_bits_access

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3000.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3000


commit b7df6abe73a633c0df79c5b426df9416d7a4c1bc
Author: Manhua 
Date:   2018-12-19T03:30:46Z

fix access error for bits in bloom filter




---


[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...

2018-12-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2713#discussion_r242507682
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
 ---
@@ -218,56 +218,46 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 this.bloomFilterSize, this.bloomFilterFpp, bloomCompress);
   }
 
-  /**
-   * returns all shard directories of bloom index files for query
-   * if bloom index files are merged we should get only one shard path
-   */
-  private Set getAllShardPaths(String tablePath, String segmentId) 
{
-String dataMapStorePath = CarbonTablePath.getDataMapStorePath(
-tablePath, segmentId, dataMapName);
-CarbonFile[] carbonFiles = 
FileFactory.getCarbonFile(dataMapStorePath).listFiles();
-Set shardPaths = new HashSet<>();
+
+  private boolean isAllShardsMerged(String dmSegmentPath) {
+boolean mergeShardExist = false;
 boolean mergeShardInprogress = false;
-CarbonFile mergeShardFile = null;
+CarbonFile[] carbonFiles = 
FileFactory.getCarbonFile(dmSegmentPath).listFiles();
 for (CarbonFile carbonFile : carbonFiles) {
-  if 
(carbonFile.getName().equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) 
{
-mergeShardFile = carbonFile;
-  } else if 
(carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) {
+  String fileName = carbonFile.getName();
+  if 
(fileName.equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) {
+mergeShardExist = true;
+  } else if 
(fileName.equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) {
 mergeShardInprogress = true;
--- End diff --

One more idea is that we can delay the deletion of original shards in 
query, referring to segment management. That is when mergeShard exists and no 
merge inprogress file in a query, we can assure to delete original shards 
safely.


---


[GitHub] carbondata pull request #2917: [WIP]Show load/insert/update/delete row numbe...

2018-12-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2917#discussion_r242500505
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/TestShowIUDRowCount.scala
 ---
@@ -0,0 +1,60 @@
+package org.apache.carbondata.spark.testsuite.iud
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+class TestShowIUDRowCount extends QueryTest with BeforeAndAfterEach with 
BeforeAndAfterAll {
+
+  override protected def beforeAll(): Unit = {
+dropTable("iud_rows")
+  }
+
+  override protected def beforeEach(): Unit = {
+dropTable("iud_rows")
+  }
+
+  override protected def afterEach(): Unit = {
+dropTable("iud_rows")
+  }
+
+  test("Test show load row count") {
+sql("""create table iud_rows (c1 string,c2 int,c3 string,c5 string)
+|STORED BY 'org.apache.carbondata.format'""".stripMargin)
+checkAnswer(
+  sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/dest.csv' INTO 
table iud_rows"""),
--- End diff --

I found that this test will pass if I add OPTIONS(), without/with detail 
config, in load command. And the logical plan will change from 
`LoadDataCommand`( class in spark) to `CarbonLoadDataCommand`(class in carbon). 
If we want to show proceeded row count, we need to change the value 
`output` in these command/query plan, but we cannot do that for 
`LoadDataCommand` in spark.



---


[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...

2018-12-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2713#discussion_r242454233
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
 ---
@@ -218,56 +218,46 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 this.bloomFilterSize, this.bloomFilterFpp, bloomCompress);
   }
 
-  /**
-   * returns all shard directories of bloom index files for query
-   * if bloom index files are merged we should get only one shard path
-   */
-  private Set getAllShardPaths(String tablePath, String segmentId) 
{
-String dataMapStorePath = CarbonTablePath.getDataMapStorePath(
-tablePath, segmentId, dataMapName);
-CarbonFile[] carbonFiles = 
FileFactory.getCarbonFile(dataMapStorePath).listFiles();
-Set shardPaths = new HashSet<>();
+
+  private boolean isAllShardsMerged(String dmSegmentPath) {
+boolean mergeShardExist = false;
 boolean mergeShardInprogress = false;
-CarbonFile mergeShardFile = null;
+CarbonFile[] carbonFiles = 
FileFactory.getCarbonFile(dmSegmentPath).listFiles();
 for (CarbonFile carbonFile : carbonFiles) {
-  if 
(carbonFile.getName().equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) 
{
-mergeShardFile = carbonFile;
-  } else if 
(carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) {
+  String fileName = carbonFile.getName();
+  if 
(fileName.equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) {
+mergeShardExist = true;
+  } else if 
(fileName.equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) {
 mergeShardInprogress = true;
--- End diff --

Yes, you are right. We need to fix this. If we allow to use bloom filter 
when the index files are merging, maybe any IO Exception will occur in 
following steps when the merging is done. 

Some simple ideas for this:
1. datamap do not choose bloom when merging is under action
2. change the pruning logic to segment independent, any datamap excepts 
default datamap can reject or fail the segment pruning ( by return null or ?), 
and no more result blocklet intersection for this datamap, such that this does 
not affect final result


---


[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...

2018-12-17 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2713#discussion_r242394373
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -178,15 +178,9 @@ private String getAncestorTablePath(CarbonTable 
currentTable) {
 for (BloomQueryModel bloomQueryModel : bloomQueryModels) {
   Set tempHitBlockletsResult = new HashSet<>();
   LOGGER.debug("prune blocklet for query: " + bloomQueryModel);
-  BloomCacheKeyValue.CacheKey cacheKey = new 
BloomCacheKeyValue.CacheKey(
-  this.indexPath.toString(), bloomQueryModel.columnName);
-  BloomCacheKeyValue.CacheValue cacheValue = cache.get(cacheKey);
-  List bloomIndexList = 
cacheValue.getBloomFilters();
-  for (CarbonBloomFilter bloomFilter : bloomIndexList) {
-if (needShardPrune && 
!filteredShard.contains(bloomFilter.getShardName())) {
--- End diff --

usage of this info is moved to `getBloomFilters`


---


[GitHub] carbondata issue #2878: [CARBONDATA-3107] Optimize error/exception coding fo...

2018-12-02 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2878
  
conflicts fixed.


---


[GitHub] carbondata pull request #2917: [WIP]Show load/insert/update/delete row numbe...

2018-11-22 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2917#discussion_r235654816
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/TestShowIUDRowCount.scala
 ---
@@ -0,0 +1,60 @@
+package org.apache.carbondata.spark.testsuite.iud
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+class TestShowIUDRowCount extends QueryTest with BeforeAndAfterEach with 
BeforeAndAfterAll {
+
+  override protected def beforeAll(): Unit = {
+dropTable("iud_rows")
+  }
+
+  override protected def beforeEach(): Unit = {
+dropTable("iud_rows")
+  }
+
+  override protected def afterEach(): Unit = {
+dropTable("iud_rows")
+  }
+
+  test("Test show load row count") {
+sql("""create table iud_rows (c1 string,c2 int,c3 string,c5 string)
+|STORED BY 'org.apache.carbondata.format'""".stripMargin)
+checkAnswer(
+  sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/dest.csv' INTO 
table iud_rows"""),
--- End diff --

why  `sql()` function in QueryTest get different plan to self made spark 
context or beeline


---


[GitHub] carbondata pull request #2917: [WIP]Show load/insert/update/delete row numbe...

2018-11-13 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2917

[WIP]Show load/insert/update/delete row number

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata ProceededRowCount

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2917


commit f7d199e7d6d28c72124a2ad45635949976816d04
Author: Manhua 
Date:   2018-11-13T09:27:10Z

show load/insert/update/delete row number




---


[GitHub] carbondata pull request #2900: [CARBONDATA-3078] Disable explain collector f...

2018-11-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2900#discussion_r231014871
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -575,6 +576,8 @@ private BitSet setMatchedPartitions(String 
partitionIds, Expression filter,
*/
   public BlockMappingVO getBlockRowCount(Job job, CarbonTable table,
   List partitions) throws IOException {
+// no useful information for count star query without filter, so 
disable explain collector
+ExplainCollector.remove();
--- End diff --

OK


---


[GitHub] carbondata pull request #2900: [CARBONDATA-3078] Disable explain collector f...

2018-11-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2900#discussion_r231010531
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -575,6 +576,8 @@ private BitSet setMatchedPartitions(String 
partitionIds, Expression filter,
*/
   public BlockMappingVO getBlockRowCount(Job job, CarbonTable table,
   List partitions) throws IOException {
+// no useful information for count star query without filter, so 
disable explain collector
+ExplainCollector.remove();
--- End diff --

No remove. Its implementation is disable.


---


[GitHub] carbondata pull request #2900: [CARBONDATA-3078] Disable explain collector f...

2018-11-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2900#discussion_r230972236
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -575,6 +576,8 @@ private BitSet setMatchedPartitions(String 
partitionIds, Expression filter,
*/
   public BlockMappingVO getBlockRowCount(Job job, CarbonTable table,
   List partitions) throws IOException {
+// no useful information for count star query without filter, so 
disable explain collector
+ExplainCollector.remove();
--- End diff --

You are right. 
Normal query flow goes to `CarbonInputFormat#getPrunedBlocklets` and 
initialize the pruning info for table we queried.  But count star query without 
filter use a different query plan, it does not go into that method, so no 
pruning info does not init. When it calls default data map to prune(with a null 
filter), exception will occur during settingg pruning info.

One solution is to init the pruning info for this type of query in mrthod 
`getBlockRowCount`. But considering
no useful information about block/blocklet pruning for such query(actually 
no pruning), I choose to disable the expalin collector instead.


---


[GitHub] carbondata issue #2900: [CARBONDATA-3078] Disable explain collector for coun...

2018-11-05 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2900
  
retest this please


---


[GitHub] carbondata pull request #2900: [CARBONDATA-3078] Disable explain collector f...

2018-11-05 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2900

[CARBONDATA-3078] Disable explain collector for count star query without 
filter

An issue is found about count star query without filter in explain command. 
It is a special case. It uses different plan. 
Considering
no useful information about block/blocklet pruning for count star query 
without filter, so disable explain collector and avoid the exception in 
https://issues.apache.org/jira/browse/CARBONDATA-3078

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata explain_countstar

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2900


commit 46f07a25924e98d43dd9dcfad3a1c7cb0cd4d895
Author: Manhua 
Date:   2018-11-05T12:17:59Z

disable explain collector for count star query without filter




---


[GitHub] carbondata issue #2894: [CARBONDATA-3074] Change default sort temp compresso...

2018-11-05 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2894
  
retest this please


---


[GitHub] carbondata pull request #2894: [CARBONDATA-3074] Change default sort temp co...

2018-11-05 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2894

[CARBONDATA-3074] Change default sort temp compressor to SNAPPY

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata sorttempDefault

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2894.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2894


commit 7877c797230b58fd4c51ffa21a026bc13493ac1f
Author: Manhua 
Date:   2018-11-05T09:28:07Z

change default sort temp compressor from empty to snappy




---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-02 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@akashrn5 thanks for reply. 

1. Let's take a detail case. you can check whether it is right.
 
 In `DictDimensionIndexCodec#createEncoder`, as the setting I said above
 `isSort`=false
 `isDoInvertedIndex` = true
 `isInvertedIndex`=`isSort`&&`isDoInvertedIndex` = false
so, it will go to `indexStorage = new 
BlockIndexerStorageForNoInvertedIndexForShort(data, false);`.
In the construction method, we can see that it only assigns the dataPage 
value. No RLE.
```
  public BlockIndexerStorageForNoInvertedIndexForShort(byte[][] dataPage, 
boolean applyRLE) {
this.dataPage = dataPage;
if (applyRLE) {
  List actualDataList = new ArrayList<>();
  for (int i = 0; i < dataPage.length; i++) {
actualDataList.add(dataPage[i]);
  }
  rleEncodeOnData(actualDataList);
}
  }
```

2. If isInvertedIndex is TRUE, then the isSort check must be TRUE

```
isInvertedIndex=  isSort&&isDoInvertedIndex;
   ^ ^   ^
   | |   |
internalUsed  SORT_COLUMNS   INVERTED_INDEX

```





---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
The InvertedIndex/NoInvertedIndex setting is confusing.
1. the value `isInvertedIndex` assigned to different IndexCodec  in 
`createEncoderForDimensionLegacy` requires us to set the column both 
SortColumns and use InvertedIndex. What if I set it in INVERTED_INDEX but not 
in SORT_COLUMNS?
2. what the boolean value `isInvertedIndex` in IndexCodec do is to control 
whether to do RLE on datapage?

These make the setting not a direct switch to control how the data proceed


---


[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2886#discussion_r230260481
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -359,8 +359,13 @@ private CarbonCommonConstants() {
   public static final String TABLE_BLOCKSIZE = "table_blocksize";
   // table blocklet size in MB
   public static final String TABLE_BLOCKLET_SIZE = "table_blocklet_size";
-  // set in column level to disable inverted index
+  /**
+   * set in column level to disable inverted index
+   * @Deprecated :This property is deprecated, it is kep just for 
compatibility
--- End diff --

spelling: kept


---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
please remember to update the doc too.


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229287888
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -308,7 +312,7 @@ private void processBatch(CarbonRowBatch batch, 
CarbonFactHandler dataHandler, i
   }
   writeCounter[iteratorIndex] += batch.getSize();
 } catch (Exception e) {
-  throw new CarbonDataLoadingException("unable to generate the mdkey", 
e);
+  throw new CarbonDataLoadingException(e);
--- End diff --

The KeyGenException extend Exception, it needs 
CarbonDataLoadingException(RuntimeException) to wrap and throw. 


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229285339
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -212,7 +212,11 @@ private void finish(CarbonFactHandler dataHandler, int 
iteratorIndex) {
 try {
   processingComplete(dataHandler);
 } catch (CarbonDataLoadingException e) {
-  exception = new CarbonDataWriterException(e.getMessage(), e);
+  // only assign when exception is null
+  // else it will erase original root cause
+  if (null == exception) {
--- End diff --

not for the statistics. better to read the whole method. It has two stages: 
finish the handler and close the handler. the exception  could be assigned in 
either stage.


---


[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...

2018-10-30 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2879
  
retest this please


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2879

[CARBONDATA-3058] Fix some exception coding in data loading

1. when exception occur in `dataHandler.finish();`, carbon does not proceed 
it immediately. Carbon keeps the exception and calls method to close the 
datahandler. But the exception would be overwrite if another exception occur 
when closing the dataHandler. This makes us lost the root cause.
Refer to `AbstractFactDataWriter.closeExecutorService()` and 
`CarbonFactDataWriterImplV3.closeWriter()`, we add null check before the second 
time assignment in `CarbonRowDataWriterProcessorStepImpl.finish()` and 
`DataWriterBatchProcessorStepImpl` to avoid exception overwrite.

2. remove irrelevant exception message "unable to generate the mdkey", use 
the exception itself directly. Message in the exception will be retrieved 
automatically when logging.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata exceptionFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2879.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2879


commit a36f9c98338e1ae38a9c4d9afb4bcd5ab1eb9b23
Author: Manhua 
Date:   2018-10-29T11:05:03Z

fix exception




---


[GitHub] carbondata pull request #2878: [WIP] Modification of error/exception for bet...

2018-10-30 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2878

[WIP] Modification of error/exception for better debugging

Changes in this PR follows these rules: 


| Code Case| Problem   |  Suggest Modification  |
|    | -:  | ::  |
| `LOGGER.error(e);` | no stack trace |   `LOGGER.error(e.getMessage(), 
e);` |
| `LOGGER.error(e.getMessage());` | no stack trace |   
`LOGGER.error(e.getMessage(), e);` |
| `catch ... throw new Exception("Error occur")`|useless 
message, no stack trace|  `throw new Exception(e)`  |
| `catch ... throw new Exception(e.getMessage())`|no stack 
trace|  `throw new Exception(e)`  |
| `catch ... throw new Exception(e.getMessage(), e)`|   no need to 
call `getMessage()`   |   `throw new Exception(e)`   |
| `catch ... throw new Exception("Error occur: " + e.getMessage(), e)`  
  |useless message|  `throw new Exception(e)`  |
| `catch ... throw new Exception("DataLoad fail: " + e.getMessage())`   
 |no stack trace|  `throw new Exception("DataLoad fail: " + 
e.getMessage(), e)`  |

Some exceptions, such as MalformedCarbonCommandException, 
InterruptException, NumberFormatException, InvalidLoadOptionException and 
NoRetryException, do not have Constructor using `Throwable`, so we do not 
change it


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinjmh/carbondata error_exception

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2878


commit 78792ba19a0b09ca02581890e708ca0ce4cc
Author: Manhua 
Date:   2018-10-30T01:44:59Z

LOGGER.error(e); ==> LOGGER.error(e.getMessage(), e);

commit 4a7c58cab1f30178003e55b47978daf8d474fb31
Author: Manhua 
Date:   2018-10-30T01:48:33Z

LOGGER.error(e.getMessage()); ==> LOGGER.error(e.getMessage(), e);

commit dd42cba52500725265fbb0577702c79c2d138f6f
Author: Manhua 
Date:   2018-10-30T01:50:24Z

LOG.error(e); ==> LOG.error(e.getMessage(), e);

commit b8716e8d33c7535d1698409bd91802f27d72890d
Author: Manhua 
Date:   2018-10-30T02:01:29Z

Exception(e.getMessage(), e); ==> Exception(e);

commit f137aa7da667dcbfe867b5c83ebd6735b8c5f0f1
Author: Manhua 
Date:   2018-10-30T02:09:31Z

Exception(e.getMessage()); ==> Exception(e);

commit da0df82a9a43f54f7cf7072a4c3497a731ca261a
Author: Manhua 
Date:   2018-10-30T02:18:37Z

 + e.getMessage()); ==>  + e.getMessage(), e);

commit 4c1ecaf636391327a57621721e2a5318fb8cc11f
Author: Manhua 
Date:   2018-10-30T02:36:17Z

fix style

commit 26fa06c96aeaa7418424df6f5089315d95cfadbb
Author: Manhua 
Date:   2018-10-30T02:53:22Z

some more fix for .getMessage()




---


[GitHub] carbondata issue #2732: [CARBONDATA-3020] support lz4 as column compressor

2018-10-28 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2732
  
retest this please


---


[GitHub] carbondata pull request #2866: [CARBONDATA-3050][Doc] Remove unused paramete...

2018-10-28 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2866

[CARBONDATA-3050][Doc] Remove unused parameter doc

Remove documentation of parameter 'carbon.use.multiple.temp.dir'

*Related PR:* 
> #2824 - removed the parameter 'carbon.use.multiple.temp.dir'



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata doc_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2866


commit d9f9a3bf694837f43817689a5122bfcc2ba8ae65
Author: Manhua 
Date:   2018-10-29T02:08:15Z

remove unused parameter doc




---


[GitHub] carbondata pull request #2732: [CARBONDATA-3020] support lz4 as column compr...

2018-10-28 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2732#discussion_r228778578
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/Lz4Compressor.java
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+import java.util.Arrays;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import net.jpountz.lz4.LZ4Compressor;
+import net.jpountz.lz4.LZ4Factory;
+import net.jpountz.lz4.LZ4FastDecompressor;
+
+
+public class Lz4Compressor implements Compressor {
--- End diff --

added


---


[GitHub] carbondata issue #2851: [CARBONDATA-3040][BloomDataMap] Add checking before ...

2018-10-24 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2851
  
description updated


---


[GitHub] carbondata pull request #2851: [CARBONDATA-3040][BloomDataMap] Add checking ...

2018-10-24 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2851

[CARBONDATA-3040][BloomDataMap] Add checking before merging bloom index

*Scene*
There is a bug which causes query failure when we create two bloom datamaps 
on same table with data.  

*Analyse*
Since we already have data, each create datamap will trigger rebuild 
datamap task and then trigger bloom index file merging. By debuging, we found 
the first datamap's bloom index files would be merged two times and the second 
time made bloom index file empty.

*Solution*
Send the datamap name in rebuild event for filter. And add file check when 
merging.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata fix_multi_bloom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2851


commit bcab5ac630e39a7dadee09d5b9157642d061b5e1
Author: Manhua 
Date:   2018-10-24T08:20:13Z

only rebuild target datamap and add file check




---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Assign to datamap ...

2018-10-07 Thread kevinjmh
Github user kevinjmh closed the pull request at:

https://github.com/apache/carbondata/pull/2665


---


[GitHub] carbondata issue #2781: [CARBONDATA-2983][BloomDataMap] Change bloom query m...

2018-09-28 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2781
  
retest this please


---


[GitHub] carbondata pull request #2781: [CARBONDATA-2983][BloomDataMap] Change bloom ...

2018-09-28 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2781

[CARBONDATA-2983][BloomDataMap] Change bloom query model to proceed 
multiple filter values

currently, bloom generates multiple query model for each value of 
InExpression.
This PR changes the query model to proceed multiple filter values.
Also, intersect result between query model/expression to support 
AndExpression optimize


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata query_model

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2781


commit 25ab0270021ec1c58219690c7389003c553768bf
Author: Manhua 
Date:   2018-09-28T03:39:39Z

build 1 query model for InExp;
intersect result between query model/expression




---


[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

2018-09-27 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2767#discussion_r221123795
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -282,7 +280,9 @@ private void extractColumnExpression(Expression 
expression,
   List children = expression.getChildren();
   if (children != null && children.size() > 0) {
 for (Expression exp : children) {
-  extractColumnExpression(exp, columnExpressions);
+  if (exp != null && exp.getFilterExpressionType() != 
ExpressionType.UNKNOWN) {
--- End diff --

I'am not sure whether any **other** expression exists like 
SparkUnknownExpression( which is under EqualToExpression/InExpression and  has 
a child of ColumnExpression   )


---


[GitHub] carbondata pull request #2767: [CARBONDATA-2974] Fixed multiple expressions ...

2018-09-27 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2767#discussion_r221117943
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -177,34 +180,35 @@ private ExpressionTuple selectDataMap(Expression 
expression, List
   // If both left and right has datamap then we can either merge 
both datamaps to single
   // datamap if possible. Otherwise apply AND expression.
   if (left.dataMapExprWrapper != null && right.dataMapExprWrapper 
!= null) {
-filterExpressionTypes.add(
-
left.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression()
-.getFilterExpressionType());
-filterExpressionTypes.add(
-
right.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression()
-.getFilterExpressionType());
+filterExpressionTypes.addAll(left.filterExpressionTypes);
+filterExpressionTypes.addAll(right.filterExpressionTypes);
 List columnExpressions = new ArrayList<>();
 columnExpressions.addAll(left.columnExpressions);
 columnExpressions.addAll(right.columnExpressions);
 // Check if we can merge them to single datamap.
 TableDataMap dataMap =
 chooseDataMap(allDataMap, columnExpressions, 
filterExpressionTypes);
+TrueConditionalResolverImpl resolver = new 
TrueConditionalResolverImpl(
+new AndExpression(left.expression, right.expression), 
false,
+true);
--- End diff --

so this resolver is used for re-organizing the required expressions?


---


[GitHub] carbondata issue #2767: [CARBONDATA-2974] Fixed multiple expressions issue o...

2018-09-27 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2767
  
Hi @ravipesala , I have some question about this PR
1. It is a little confusing to only deal with AND case and ignoring OR case 
if we want the framework do expression merging in general. 
2. This PR still failed when querying on STRING column with bloom filter 
like "column1 = 123". Please refer to case 1 in description of PR #2665 for 
details
3. One more fail case found if apply this PR:  
  3.1 query with filter alike `(Col1='a'  or Col2 ='b')  and (Col3='c' or 
Col4='d' )  `
  3.2 ensure test result without bloom should not be empty
  3.3 create bloom filter on multi columns in same datamap
  3.4 test result with bloom would be empty. Because OrExpression is not 
proceeded when creating bloom query model now.
Please check it.

One more thing I think we could do to improve datamap pruning efficiency is 
to reuse pruning result between filter conditions.  Such as we query with two 
filter conditions on segment with 5 blocklets, we get 2 hit blocklets after 
applying first filter condition, and then the second filter condition can only 
apply on those 2 blocklets instead of 5.


---


[GitHub] carbondata issue #2765: [CARBONDATA-2971] Add shard info of blocklet for deb...

2018-09-26 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2765
  
retest this please


---


[GitHub] carbondata pull request #2765: [CARBONDATA-2971] Add shard info of blocklet ...

2018-09-26 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2765#discussion_r220490591
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java ---
@@ -92,7 +95,13 @@ public String getFilePath() {
 blocklet.blockletId == null;
   }
 
-  @Override public int hashCode() {
+  @Override
+  public String toString() {
+return String.format("[shard:%s, blockletId:%s]", filePath, 
blockletId);
--- End diff --

updated


---


[GitHub] carbondata pull request #2765: [CARBONDATA-2971] Add shard info of blocklet ...

2018-09-26 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2765#discussion_r220455196
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java ---
@@ -92,7 +95,13 @@ public String getFilePath() {
 blocklet.blockletId == null;
   }
 
-  @Override public int hashCode() {
+  @Override
+  public String toString() {
+return String.format("[shard:%s, blockletId:%s]", filePath, 
blockletId);
--- End diff --

The shard name in `toString()` is exactly the one in `Blocklet` itself


---


[GitHub] carbondata pull request #2765: [CARBONDATA-2971] Add shard info of blocklet ...

2018-09-25 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2765

[CARBONDATA-2971] Add shard info of blocklet for debugging

add `toString` method to print both shard name and blocklet id for 
debugging. 


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata addDebugInfo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2765


commit 225ea72f16fafe5d16192763d01614822f31e2dd
Author: Manhua 
Date:   2018-09-26T02:34:54Z

add toString




---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

2018-09-20 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2665#discussion_r219378413
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
 ---
@@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable 
carbonTable, DataMapSchema dat
 }
   }
 
+  @Override
+  public boolean isSupport(Expression expression) {
+ColumnExpression filterColumn = null;
+// First check type of child nodes, and get filter column name
+switch (expression.getFilterExpressionType()) {
+  case IN:
+InExpression inExpr = (InExpression) expression;
+if (inExpr.getLeft() instanceof ColumnExpression &&
+inExpr.getRight() instanceof ListExpression) {
+  filterColumn = (ColumnExpression) inExpr.getLeft();
+} else if (inExpr.getRight() instanceof ColumnExpression &&
+inExpr.getLeft() instanceof ListExpression) {
+  filterColumn = (ColumnExpression) inExpr.getRight();
+}
+break;
+  case EQUALS:
+EqualToExpression equalToExpr = (EqualToExpression) expression;
+if (equalToExpr.getLeft() instanceof ColumnExpression &&
+equalToExpr.getRight() instanceof LiteralExpression) {
+  filterColumn = (ColumnExpression) equalToExpr.getLeft();
+} else if (equalToExpr.getRight() instanceof ColumnExpression &&
+equalToExpr.getLeft() instanceof LiteralExpression) {
+  filterColumn = (ColumnExpression) equalToExpr.getRight();
+}
+break;
+  default:
+break;
--- End diff --

ok


---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

2018-09-20 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2665#discussion_r219378217
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
 ---
@@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable 
carbonTable, DataMapSchema dat
 }
   }
 
+  @Override
+  public boolean isSupport(Expression expression) {
+ColumnExpression filterColumn = null;
+// First check type of child nodes, and get filter column name
+switch (expression.getFilterExpressionType()) {
+  case IN:
+InExpression inExpr = (InExpression) expression;
+if (inExpr.getLeft() instanceof ColumnExpression &&
+inExpr.getRight() instanceof ListExpression) {
+  filterColumn = (ColumnExpression) inExpr.getLeft();
+} else if (inExpr.getRight() instanceof ColumnExpression &&
+inExpr.getLeft() instanceof ListExpression) {
+  filterColumn = (ColumnExpression) inExpr.getRight();
+}
+break;
+  case EQUALS:
+EqualToExpression equalToExpr = (EqualToExpression) expression;
+if (equalToExpr.getLeft() instanceof ColumnExpression &&
+equalToExpr.getRight() instanceof LiteralExpression) {
+  filterColumn = (ColumnExpression) equalToExpr.getLeft();
+} else if (equalToExpr.getRight() instanceof ColumnExpression &&
+equalToExpr.getLeft() instanceof LiteralExpression) {
+  filterColumn = (ColumnExpression) equalToExpr.getRight();
+}
+break;
+  default:
+break;
+}
+// Then check if filter column is in index columns
+if (null != filterColumn && 
dataMapMeta.getIndexedColumnNames().contains(
--- End diff --

for safety, convert to lowercase in both side to check


---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

2018-09-20 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2665#discussion_r219376540
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -364,6 +365,13 @@ public DataMapMeta getMeta() {
 return null;
   }
 
+  @Override
+  public boolean isSupport(Expression expression) {
+// this method is used for choosing cg/fg datamap
+// for default datamap always return false
+return false;
--- End diff --

I get your point. I will change it to return true for better code reading. 
Also will add checking meta info( meta of default datamap is NULLnow) to avoid 
null point exception


---


[GitHub] carbondata issue #2734: [CARBONDATA-2946] Bloom filter backward compatibilit...

2018-09-20 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2734
  
Test case which found this bug pass. Please refer last PR #2654 in 
description for tracking


---


[GitHub] carbondata issue #2732: [WIP] lz4 as column compressor in final store

2018-09-19 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2732
  
retest this please


---


[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

2018-09-19 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2665
  
retest this please


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-09-19 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
OK


---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

2018-09-19 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2665#discussion_r218679293
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -268,23 +238,38 @@ private ExpressionTuple selectDataMap(Expression 
expression, List
 
   private void extractColumnExpression(Expression expression,
   List columnExpressions) {
-if (expression instanceof ColumnExpression) {
-  columnExpressions.add((ColumnExpression) expression);
-} else if (expression instanceof MatchExpression) {
-  // this is a special case for lucene
-  // build a fake ColumnExpression to filter datamaps which contain 
target column
-  // a Lucene query string is alike "column:query term"
-  String[] queryItems = expression.getString().split(":", 2);
-  if (queryItems.length == 2) {
-columnExpressions.add(new ColumnExpression(queryItems[0], null));
-  }
-} else if (expression != null) {
-  List children = expression.getChildren();
-  if (children != null && children.size() > 0) {
-for (Expression exp : children) {
-  extractColumnExpression(exp, columnExpressions);
+switch (expression.getFilterExpressionType()) {
--- End diff --

Change to check by method `isSupport` in datamap factory


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-09-18 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
I ran a test on table with bloom datamap  created before applying this PR, 
and query it after this PR merged, but the answer is not correct. Can you check 
 it?

Procedure to reproduce:

- switch master code before this PR merged
- create table with no-dict measure column (set the measure column as sort 
column)
- create bloom datamap on the measure column
- load some data into table
- query on the measure column, get a result
- switch to code after this PR merged
- do the same query and compare the result


---


[GitHub] carbondata pull request #2732: [WIP] lz4 as column compressor in final store

2018-09-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2732#discussion_r218645316
  
--- Diff: core/src/main/java/net/jpountz/lz4/LZ4DecompressorWithLength.java 
---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// code ported from https://github.com/lz4/lz4-java/issues/119
+// remove this class when new version > 1.4.1 released
+// this is only for test
+
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package net.jpountz.lz4;
+
+import java.nio.ByteBuffer;
+
+// code ported from https://github.com/lz4/lz4-java/issues/119
+// remove this when new version > 1.4.1 released
+
+/**
+ * Convenience class to decompress data compressed by {@link 
LZ4CompressorWithLength}.
+ * This decompressor is NOT compatible with any other compressors in 
lz4-java
+ * or any other lz4 tools.
+ * The user does not need to specify the length of the compressed data or
+ * original data because the length of the original decompressed data is
+ * included in the compressed data.
+ */
+
+public class LZ4DecompressorWithLength {
+
+  private final LZ4FastDecompressor decompressor;
+
+  /**
+   * Returns the decompressed length of compressed data in 
src.
+   *
+   * @param src the compressed data
+   * @return the decompressed length
+   */
+  public static int getDecompressedLength(byte[] src) {
+return getDecompressedLength(src, 0);
+  }
+
+  /**
+   * Returns the decompressed length of compressed data in 
src[srcOff:].
+   *
+   * @param src the compressed data
+   * @param srcOff the start offset in src
+   * @return the decompressed length
+   */
+  public static int getDecompressedLength(byte[] src, int srcOff) {
+return (src[srcOff] & 0xFF) | (src[srcOff + 1] & 0xFF) << 8 |
+(src[srcOff + 2] & 0xFF) << 16 | src[srcOff + 3] << 24;
+  }
+
+  /**
+   * Returns the decompressed length of compressed data in 
src.
+   *
+   * @param src the compressed data
+   * @return the decompressed length
+   */
+  public static int getDecompressedLength(ByteBuffer src) {
+return getDecompressedLength(src, src.position());
+  }
+
+  /**
+   * Returns the decompressed length of compressed data in 
src[srcOff:].
+   *
+   * @param src the compressed data
+   * @param srcOff the start offset in src
+   * @return the decompressed length
+   */
+  public static int getDecompressedLength(ByteBuffer src, int srcOff) {
+return (src.get(srcOff) & 0xFF) | (src.get(srcOff + 1) & 0xFF) << 8 |
+(src.get(srcOff + 2) & 0xFF) << 16 | src.get(srcOff + 3) << 24;
+  }
+
+  /**
+   * Creates a new decompressor to decompress data compressed by {@link 
LZ4CompressorWithLength}.
+   *
+   * @param decompressor decompressor to use
+   */
+  public LZ4DecompressorWithLength(LZ4FastDecompressor decompressor) {
+this.decompressor = decompressor;
+  }
+
+  /**
+   * Convenience method, equivalent to calling
+   * {@link #decompress(byte[], int, byte[], int) decompress(src, 0, dest, 
0)}.
+   *
+   * @param src the compressed data
+   * @param dest the destination buffer to store the decompressed data
+   * @return the number o

[GitHub] carbondata pull request #2732: [WIP] lz4 as column compressor in final store

2018-09-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2732#discussion_r218635891
  
--- Diff: core/src/main/java/net/jpountz/lz4/LZ4CompressorWithLength.java 
---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// code ported from https://github.com/lz4/lz4-java/issues/119
+// remove this class when new version > 1.4.1 released
+// this is only for test
+
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package net.jpountz.lz4;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+
+/**
+ * Covenience class to include the length of the original decompressed data
+ * in the output compressed data, so that the user does not need to save
+ * the length at anywhere else.  The compressed data must be decompressed 
by
+ * {@link LZ4DecompressorWithLength} and is NOT compatible with any other
+ * decompressors in lz4-java or any other lz4 tools.  This class 
deliberately
+ * does not extend {@link LZ4Compressor} because they are not 
interchangable.
+ */
+
+public class LZ4CompressorWithLength {
--- End diff --

yes.  These codes didn't not packed in a released jar. Here we cope it only 
for test.  See comment in L18-20


---


[GitHub] carbondata pull request #2732: [WIP] lz4 as column compressor in final store

2018-09-18 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2732

[WIP] lz4 as column compressor in final store

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata lz4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2732


commit b5a4353c9f7536973f8aa1900757e2266cde31ee
Author: Manhua 
Date:   2018-09-18T11:41:51Z

lz4 test




---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-18 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r218403711
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java
 ---
@@ -239,6 +239,7 @@ public void addDataToStore(CarbonRow row) throws 
CarbonDataWriterException {
* @return false if any varchar column page cannot add one more 
value(2MB)
*/
   private boolean isVarcharColumnFull(CarbonRow row) {
+//TODO: test and remove this as now  UnsafeSortDataRows can exceed 2MB
--- End diff --

Original implementation use `2MB` to ensure next varchar column value can 
be filled safely, because size of value of single column won't exceed size of a 
row.
If UnsafeSortDataRows can exceed 2MB(growing dynamically), then we cannot 
check whether we have enough space for next value because we are not sure how 
many space next value will take. So the column page size check should be run 
before adding row to `dataRows`  


---


[GitHub] carbondata pull request #2723: [CARBONDATA-2938][DataMap] Update comment of ...

2018-09-14 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2723

[CARBONDATA-2938][DataMap] Update comment of blockletId in 
IndexDataMapRebuildRDD

**Background**:
#2539 Tried to make use of blocklet id information from query when 
rebuilding datamap.
#2565 Revert PR2539 since it would cause wrong answer when testing on table 
with large mount of data

**What's in this PR**:
Update comment to explain why blocklet id information from query does not 
work



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata rebuild_blockletid

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2723


commit 786f743703d3dfb824b12b9a29f9c16d3ef3
Author: Manhua 
Date:   2018-09-15T02:30:00Z

update comment




---


[GitHub] carbondata issue #2713: [CARBONDATA-2931][BloomDataMap] Optimize bloom datam...

2018-09-12 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2713
  
retest this please


---


[GitHub] carbondata pull request #2713: [CARBONDATA-2931][BloomDataMap] Optimize bloo...

2018-09-12 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2713

[CARBONDATA-2931][BloomDataMap] Optimize bloom datamap pruning

1. re-use shard pruning info from default datamap
2. create one BloomCoarseGrainDataMap object per segment instead of per 
shard. (This is also  preparation for parallel segment pruning)


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata bloom_shard_op

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2713


commit 8544ed699424ff6089d28b8e07f582a2f9e25b78
Author: Manhua 
Date:   2018-08-30T01:43:10Z

all shard of one segment use one datamap




---


[GitHub] carbondata pull request #2696: [CARBONDATA-2902][DataMap] Fix showing negati...

2018-09-12 Thread kevinjmh
Github user kevinjmh closed the pull request at:

https://github.com/apache/carbondata/pull/2696


---


[GitHub] carbondata pull request #2711: [CARBONDATA-2929][DataMap] Add block skipped ...

2018-09-11 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2711

[CARBONDATA-2929][DataMap] Add block skipped info for explain command


This pr will add block skipped info by counting distinct file path from hit 
blocklet. It shows like below:
```
|== CarbonData Profiler ==
Table Scan on test
 - total: 125 blocks, 250 blocklets
 - filter: (l_partkey <> null and l_partkey = 1006)
 - pruned by Main DataMap
- skipped: 119 blocks, 238 blocklets
 - pruned by CG DataMap
- name: dm
- provider: bloomfilter
- skipped: 6 blocks, 12 blocklets
```

```
|== CarbonData Profiler ==
Table Scan on test
 - total: 125 blocks, 250 blocklets
 - filter: TEXT_MATCH('l_shipmode:AIR')
 - pruned by Main DataMap
- skipped: 0 blocks, 0 blocklets
 - pruned by FG DataMap
- name: dm
- provider: lucene
- skipped: 12 blocks, 80 blocklets
```


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata explain_block_skip

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2711


commit 0828b4d3f366b02a3f9db89e862fc9bc0b89
Author: Manhua 
Date:   2018-09-12T03:29:46Z

add block skip info




---


[GitHub] carbondata issue #2696: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-09-10 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2696
  
I found some material which may be helpful. 

> A PR can be closed without committing it like so: 
https://mahout.apache.org/developers/github.html#closing-a-pr-without-committing-for-committers
> Here's an example that closed 
https://github.com/apache/incubator-airflow/pull/2440 :
> git commit --allow-empty -m "Closes apache/incubator-airflow#2440 
*Already Merged*"



Ref:
 the first comment by Chris Riccomini  in 
https://cwiki.apache.org/confluence/display/AIRFLOW/Committers%27+Guide


---


[GitHub] carbondata issue #2696: [CARBONDATA-2902][DataMap] Fix showing negative prun...

2018-09-10 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2696
  
This PR is already merged in this 
[commit](https://github.com/apache/carbondata/commit/f04850f39d8c42b96ee419140c9506f7df988075),
 but asfgit did not close this. Is it OK if I close this PR directly? @jackylk 


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-09-06 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r215829032
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -331,8 +332,18 @@ private BloomQueryModel 
buildQueryModelInternal(CarbonColumn carbonColumn,
   // for dictionary/date columns, convert the surrogate key to bytes
   internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, 
convertedValue);
 } else {
-  // for non dictionary dimensions, is already bytes,
-  internalFilterValue = (byte[]) convertedValue;
+  // for non dictionary dimensions, numeric columns will be of 
original data,
+  // so convert the data to bytes
+  if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) {
+if (convertedValue == null) {
+  convertedValue = 
DataConvertUtil.getNullValueForMeasure(carbonColumn.getDataType(),
+  carbonColumn.getColumnSchema().getScale());
+}
+internalFilterValue =
+CarbonUtil.getValueAsBytes(carbonColumn.getDataType(), 
convertedValue);
--- End diff --

The above problem is similar to your second commit for minmax


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-09-06 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r215827023
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -331,8 +332,18 @@ private BloomQueryModel 
buildQueryModelInternal(CarbonColumn carbonColumn,
   // for dictionary/date columns, convert the surrogate key to bytes
   internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, 
convertedValue);
 } else {
-  // for non dictionary dimensions, is already bytes,
-  internalFilterValue = (byte[]) convertedValue;
+  // for non dictionary dimensions, numeric columns will be of 
original data,
+  // so convert the data to bytes
+  if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) {
+if (convertedValue == null) {
+  convertedValue = 
DataConvertUtil.getNullValueForMeasure(carbonColumn.getDataType(),
+  carbonColumn.getColumnSchema().getScale());
+}
+internalFilterValue =
+CarbonUtil.getValueAsBytes(carbonColumn.getDataType(), 
convertedValue);
--- End diff --

Result of `getValueAsBytes`  conflicts with existing bloom index data which 
will affect query result.
For measure in  'NoDict',  original implementation used 
`NonDictionaryFieldConverterImpl` to convert value, but now it uses 
`MeasureFieldConverterImpl` to convert value


---


[GitHub] carbondata pull request #2696: [CARBONDATA-2902][DataMap] Fix showing negati...

2018-09-06 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2696

[CARBONDATA-2902][DataMap] Fix showing negative pruning result for explain 
command

#2676 used method `ByteBuffer.getShort(int index)` to get number of 
blocklets in block, but it used wrong parameter. The `index` is index of byte 
instead of index of short. So it needs to multiply bytes of short type

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata explain_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2696


commit 1cdfc5c8e592f53170cf942bfbb45cc5e5d67719
Author: Manhua 
Date:   2018-09-06T09:09:21Z

fix negative blocklet skipped when cache level is block




---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-09-05 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r215153728
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala
 ---
@@ -264,8 +264,17 @@ class RawBytesReadSupport(segmentProperties: 
SegmentProperties, indexColumns: Ar
   rtn(i) = if (indexCol2IdxInDictArray.contains(col.getColName)) {
 
surrogatKeys(indexCol2IdxInDictArray(col.getColName)).toInt.asInstanceOf[Integer]
   } else if (indexCol2IdxInNoDictArray.contains(col.getColName)) {
-data(0).asInstanceOf[ByteArrayWrapper].getNoDictionaryKeyByIndex(
+val bytes = 
data(0).asInstanceOf[ByteArrayWrapper].getNoDictionaryKeyByIndex(
   indexCol2IdxInNoDictArray(col.getColName))
+// no dictionary primitive columns are expected to be in original 
data while loading,
+// so convert it to original data
+if (DataTypeUtil.isPrimitiveColumn(col.getDataType)) {
+  val dataFromBytes = DataTypeUtil
+.getDataBasedOnDataTypeForNoDictionaryColumn(bytes, 
col.getDataType)
+  dataFromBytes
--- End diff --

if isPrimitiveColumn, need null check and get null value for measure


---


[GitHub] carbondata pull request #2685: [CARBONDATA-2910] Support backward compatabil...

2018-09-04 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2685#discussion_r214886972
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
 ---
@@ -630,6 +638,24 @@ private boolean 
useMinMaxForExecutorPruning(FilterResolverIntf filterResolverInt
 return useMinMaxForPruning;
   }
 
+  @Override
+  public List prune(Expression expression, SegmentProperties 
segmentProperties,
+  List partitions, AbsoluteTableIdentifier identifier) 
throws IOException {
+FilterResolverIntf filterResolverIntf = null;
+if (expression != null) {
+  SegmentProperties properties = getSegmentProperties();
+  QueryModel.FilterProcessVO processVO =
+  new QueryModel.FilterProcessVO(properties.getDimensions(), 
properties.getMeasures(),
+  new ArrayList());
+  QueryModel.processFilterExpression(processVO, expression, null, 
null);
+  // Optimize Filter Expression and fit RANGE filters is conditions 
apply.
+  FilterOptimizer rangeFilterOptimizer = new 
RangeFilterOptmizer(expression);
+  rangeFilterOptimizer.optimizeFilter();
+  filterResolverIntf = CarbonTable.resolveFilter(expression, 
identifier);
--- End diff --

can we pull up the transformation from expression to filterResolverIntf  so 
that we can reuse most code instead of adding another `prune` method everywhere


---


[GitHub] carbondata pull request #2688: [CARBONDATA-2911] Remove unused BTree related...

2018-09-03 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2688

[CARBONDATA-2911] Remove unused BTree related code

1. BTree related code is only used by a test class 
called`BTreeBlockFinderTest`.
2. BTreeDataRefNodeFinder in AbstractDetailQueryResultIterator never run. 
All dataRefNode are actually  instanceof BlockletDataRefNode

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata remove_btree_related

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2688.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2688


commit 47f18af7277acaec8de4e166d6a268b2fa8b2e7e
Author: Manhua 
Date:   2018-09-04T02:20:07Z

remove btree related code




---


[GitHub] carbondata pull request #2676: [CARBONDATA-2902][DataMap] Fix showing negati...

2018-08-30 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2676

[CARBONDATA-2902][DataMap] Fix showing negative pruning result for explain 
command

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata explain_block_level

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2676


commit 50a15da1e1ab8165a1674e5ed6778ef18316628b
Author: Manhua 
Date:   2018-08-31T03:44:10Z

show different explain result for block/blocklet pruning




---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-29 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
retest this please


---


[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

2018-08-28 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2665
  
retest this please


---


[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

2018-08-28 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2665

[CARBONDATA-2897][DataMap] Optimize datamap chooser

In this PR,
1. Remove code for merging into one datamap when some datamap hits both 
child nodes of And/Or expression in DataMapChooser. This aims to make datamap 
focus on pruning single index column without any logic process. Leave logic 
stuff to be done by AndDataMapExprWrapper and OrDataMapExprWrapper
2. Only extract ColumnExpression of Expression which our datamap can handle 
in DataMapChooser.
3. Add short circuit to return pruned result when result of left node is 
empty in AndDataMapExprWrapper.



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata dmChooser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2665


commit 2c03e75f7d77df7609f2b054dcf93bdd7256bfa8
Author: Manhua 
Date:   2018-08-28T11:51:52Z

datamap chooser




---


  1   2   >