from:"xuchuanyin"

[jira] [Comment Edited] (CARBONDATA-3327) Errors lies in query with small blocklet size

2019-03-24 Thread xuchuanyin (JIRA)



[ 
https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799944#comment-16799944
 ] 

xuchuanyin edited comment on CARBONDATA-3327 at 3/24/19 8:34 AM:
-

Besides, I noticed that if we do not filter on the sort_columns, the problem 
will appear.

The content of the diff can also be accessed 
[here|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10]


was (Author: xuchuanyin):
Besides, I noticed that if we do not filter on the sort_columns, the problem 
will appear.

The content of the diff can also be accessed here:
[diff|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10]

> Errors lies in query with small blocklet size
> -
>
> Key: CARBONDATA-3327
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3327
> Project: CarbonData
>  Issue Type: Bug
>Reporter: xuchuanyin
>Priority: Major
>
> while implementing the following patch
> ```diff
> diff --git 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
>  
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> index 69374ad..c6b63a4 100644
> --- 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> +++ 
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> @@ -54,7 +54,7 @@ public final class CarbonCommonConstants {
>/**
> * min blocklet size
> */
> -  public static final int BLOCKLET_SIZE_MIN_VAL = 2000;
> +  public static final int BLOCKLET_SIZE_MIN_VAL = 1;
>  
>/**
> * max blocklet size
> diff --git 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
>  
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> index df97d0f..ace9fd5 100644
> --- 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> +++ 
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> @@ -29,6 +29,7 @@ import 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti
>  class TestSortColumns extends QueryTest with BeforeAndAfterAll {
>  
>override def beforeAll {
> +
> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE,
>  "2")
>  CarbonProperties.getInstance().addProperty(
>CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-")
>  
> ```
> I find that some of the tests in `TestSortColumns` failed with NPE and the 
> error logs show that
> ```
> 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID 
> 173)
> java.lang.NullPointerException
> at 
> org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294)
> at 
> org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:108)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 19/03/23 20:54:30 ERROR TaskSetManager: Task 0

[jira] [Comment Edited] (CARBONDATA-3327) Errors lies in query with small blocklet size

2019-03-24 Thread xuchuanyin (JIRA)



[ 
https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799944#comment-16799944
 ] 

xuchuanyin edited comment on CARBONDATA-3327 at 3/24/19 8:33 AM:
-

Besides, I noticed that if we do not filter on the sort_columns, the problem 
will appear.

The content of the diff can also be accessed here:
[diff|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10]


was (Author: xuchuanyin):
Besides, I noticed that if we do not filter on the sort_columns, the problem 
will appear.

> Errors lies in query with small blocklet size
> -
>
> Key: CARBONDATA-3327
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3327
> Project: CarbonData
>  Issue Type: Bug
>Reporter: xuchuanyin
>Priority: Major
>
> while implementing the following patch
> ```diff
> diff --git 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
>  
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> index 69374ad..c6b63a4 100644
> --- 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> +++ 
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> @@ -54,7 +54,7 @@ public final class CarbonCommonConstants {
>/**
> * min blocklet size
> */
> -  public static final int BLOCKLET_SIZE_MIN_VAL = 2000;
> +  public static final int BLOCKLET_SIZE_MIN_VAL = 1;
>  
>/**
> * max blocklet size
> diff --git 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
>  
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> index df97d0f..ace9fd5 100644
> --- 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> +++ 
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> @@ -29,6 +29,7 @@ import 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti
>  class TestSortColumns extends QueryTest with BeforeAndAfterAll {
>  
>override def beforeAll {
> +
> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE,
>  "2")
>  CarbonProperties.getInstance().addProperty(
>CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-")
>  
> ```
> I find that some of the tests in `TestSortColumns` failed with NPE and the 
> error logs show that
> ```
> 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID 
> 173)
> java.lang.NullPointerException
> at 
> org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294)
> at 
> org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:108)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; 
> aborting job
> 19/03/23 20:54:30 INFO TestSortColumns: 
> = FINISHED 
> org.apach

[jira] [Commented] (CARBONDATA-3327) Errors lies in query with small blocklet size

2019-03-24 Thread xuchuanyin (JIRA)



[ 
https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799944#comment-16799944
 ] 

xuchuanyin commented on CARBONDATA-3327:


Besides, I noticed that if we do not filter on the sort_columns, the problem 
will appear.

> Errors lies in query with small blocklet size
> -
>
> Key: CARBONDATA-3327
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3327
> Project: CarbonData
>  Issue Type: Bug
>Reporter: xuchuanyin
>Priority: Major
>
> while implementing the following patch
> ```diff
> diff --git 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
>  
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> index 69374ad..c6b63a4 100644
> --- 
> a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> +++ 
> b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
> @@ -54,7 +54,7 @@ public final class CarbonCommonConstants {
>/**
> * min blocklet size
> */
> -  public static final int BLOCKLET_SIZE_MIN_VAL = 2000;
> +  public static final int BLOCKLET_SIZE_MIN_VAL = 1;
>  
>/**
> * max blocklet size
> diff --git 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
>  
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> index df97d0f..ace9fd5 100644
> --- 
> a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> +++ 
> b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
> @@ -29,6 +29,7 @@ import 
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti
>  class TestSortColumns extends QueryTest with BeforeAndAfterAll {
>  
>override def beforeAll {
> +
> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE,
>  "2")
>  CarbonProperties.getInstance().addProperty(
>CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-")
>  
> ```
> I find that some of the tests in `TestSortColumns` failed with NPE and the 
> error logs show that
> ```
> 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID 
> 173)
> java.lang.NullPointerException
> at 
> org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294)
> at 
> org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:108)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; 
> aborting job
> 19/03/23 20:54:30 INFO TestSortColumns: 
> = FINISHED 
> org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on 
> sort_columns include no-dictionary, direct-dictionary and dictioanry' =
> 19/03/23 20:54:30 INFO TestSortColumns: 
> = TEST OUTPUT FOR 
> org.apache.carbondata.spark.testsuit

[jira] [Created] (CARBONDATA-3327) Errors lies in query with small blocklet size

2019-03-24 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-3327:
--

 Summary: Errors lies in query with small blocklet size
 Key: CARBONDATA-3327
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3327
 Project: CarbonData
  Issue Type: Bug
Reporter: xuchuanyin


while implementing the following patch
```diff
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 69374ad..c6b63a4 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -54,7 +54,7 @@ public final class CarbonCommonConstants {
   /**
* min blocklet size
*/
-  public static final int BLOCKLET_SIZE_MIN_VAL = 2000;
+  public static final int BLOCKLET_SIZE_MIN_VAL = 1;
 
   /**
* max blocklet size
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
index df97d0f..ace9fd5 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
@@ -29,6 +29,7 @@ import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti
 class TestSortColumns extends QueryTest with BeforeAndAfterAll {
 
   override def beforeAll {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE, 
"2")
 CarbonProperties.getInstance().addProperty(
   CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-")
 
```
I find that some of the tests in `TestSortColumns` failed with NPE and the 
error logs show that
```
19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID 173)
java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294)
at 
org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; 
aborting job
19/03/23 20:54:30 INFO TestSortColumns: 

= FINISHED 
org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on 
sort_columns include no-dictionary, direct-dictionary and dictioanry' =

19/03/23 20:54:30 INFO TestSortColumns: 

= TEST OUTPUT FOR 
org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'unsorted 
table creation, query data loading with heap and safe sort config' =


Job aborted due to stage failure: Task 0 in stage 104.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 104.0 (TID 173, localhost, executor 
driver): java.lang.NullPointerException
at 
org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294)
at 
org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext

[jira] [Resolved] (CARBONDATA-3281) Limit the LRU cache size

2019-03-07 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3281.

Resolution: Fixed

> Limit the LRU cache size
> 
>
> Key: CARBONDATA-3281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3281
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: TaoLi
>Priority: Minor
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> If configure the LRU bigger than jvm xmx size, then use 
> CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace.
> because if setting LRU bigger than xmx size,if we query for a big table with 
> many more carbonfiles, may cause "Error: java.io.IOException: Problem in 
> loading segment blocks: GC overhead 
> limit exceeded (state=,code=0)" the jdbc server will restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CARBONDATA-3281) Limit the LRU cache size

2019-03-07 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin reassigned CARBONDATA-3281:
--

Assignee: (was: xuchuanyin)

> Limit the LRU cache size
> 
>
> Key: CARBONDATA-3281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3281
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: TaoLi
>Priority: Minor
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> If configure the LRU bigger than jvm xmx size, then use 
> CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace.
> because if setting LRU bigger than xmx size,if we query for a big table with 
> many more carbonfiles, may cause "Error: java.io.IOException: Problem in 
> loading segment blocks: GC overhead 
> limit exceeded (state=,code=0)" the jdbc server will restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CARBONDATA-3281) Limit the LRU cache size

2019-03-07 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin reassigned CARBONDATA-3281:
--

Assignee: xuchuanyin

> Limit the LRU cache size
> 
>
> Key: CARBONDATA-3281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3281
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: TaoLi
>    Assignee: xuchuanyin
>Priority: Minor
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> If configure the LRU bigger than jvm xmx size, then use 
> CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace.
> because if setting LRU bigger than xmx size,if we query for a big table with 
> many more carbonfiles, may cause "Error: java.io.IOException: Problem in 
> loading segment blocks: GC overhead 
> limit exceeded (state=,code=0)" the jdbc server will restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-2447) Range Partition Table。When the update operation is performed, the data will be lost.

2019-02-22 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2447.

   Resolution: Fixed
Fix Version/s: (was: NONE)

> Range Partition Table。When the update operation is performed, the data will 
> be lost.
> 
>
> Key: CARBONDATA-2447
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2447
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.3.1
> Environment: centos6.5
> java8
> Spark2.1.0
> CarbonData1.3.1
>Reporter: duweike
>Priority: Blocker
> Attachments: 微信图片_20180507113738.jpg, 微信图片_20180507113748.jpg
>
>   Original Estimate: 72h
>  Time Spent: 7h 10m
>  Remaining Estimate: 64h 50m
>
> Range Partition Table。When the update operation is performed, the data will 
> be lost.
> As shown in the picture。
> 如下面图片所示，数据丢失必现。
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-3107) Optimize error/exception coding for better debugging

2019-02-22 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3107.

Resolution: Fixed

> Optimize error/exception coding for better debugging
> 
>
> Key: CARBONDATA-3107
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3107
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: jiangmanhua
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-3278) Remove duplicate code to get filter string of date/timestamp

2019-02-22 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3278.

Resolution: Fixed

> Remove duplicate code to get filter string of date/timestamp
> 
>
> Key: CARBONDATA-3278
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3278
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: jiangmanhua
>Assignee: jiangmanhua
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
@manishnalla1994 

> Solution:
Check if any other RDD is sharing the same task context. If so, don't the 
clear the resource at that time, the other RDD which shared the context should 
clear the memory once after the task is finished.

It seems in #2591, for data source table scenario, if the query and insert 
procedures also share the same context, it can also benefit from the 
implementation in #2591 without any changes. Right?


---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
We do not need to expose this threshold to the user. Instead, we can judge 
ourselves in carbondata.

Step1. We can get the size of non-dictionary-encoded page (say M) and the 
size of dictionary-encoded page (say N). 
Step2: if M/N >=1 (or M/N >= 0.9), we can fallback automatically.

Parquet (maybe ORC) behaves like this.



---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246427391
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala
 ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
 /**
  * configure alluxio:
  * 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: 
https://www.alluxio.org/docs/1.8/en/compute/Spark.html
  */
-
 object AlluxioExample {
-  def main(args: Array[String]) {
-val spark = ExampleUtils.createCarbonSession("AlluxioExample")
-exampleBody(spark)
-spark.close()
+  def main (args: Array[String]) {
+val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+  storePath = "alluxio://localhost:19998/carbondata")
+exampleBody(carbon)
+carbon.close()
   }
 
-  def exampleBody(spark : SparkSession): Unit = {
+  def exampleBody (spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
 spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
--- End diff --

So you need to mention this in the current document


---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
> This PR is just to add the size based limitation so that the map size can 
be controlled.
@kunal642 Yeah, I noticed that.  So my proposal is that please make a 
reservation for minimal changes when we want to implement that feature 
(automatically size detect and fall back) later.


---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r246056127
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -2076,4 +2076,15 @@ private CarbonCommonConstants() {
*/
   public static final String 
CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512";
 
+  public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD =
--- End diff --

It still failed to make it clear that what kind of size it supposed to be 
since we have a storage size and a counting size.


---

[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
Does this PR fix two problems?
If it is yes, better to separate it into two. And for the first problem, 
I'm also concerning about the performance decrease. The rawCompress can save 
some memory copy operations, that's why we add a check there and try to use 
that feature if the compressor supports that. It may needs more observations 
about the performance decreasement OR we can just add a switch there to control 
the behavior and it will be helpful for comparison.


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246047276
  
--- Diff: docs/documentation.md ---
@@ -29,15 +29,15 @@ Apache CarbonData is a new big data file format for 
faster interactive query usi
 
 **Quick Start:** [Run an example 
program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 on your local machine or [study some 
examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).
 
-**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL 
language and adds several [DDL](./ddl-of-carbondata.md) and 
[DML](./dml-of-carbondata.md) statements to support operations on it.Refer to 
the [Reference Manual](./language-manual.md) to understand the supported 
features and functions.
+**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL 
language and adds several [DDL](./ddl-of-carbondata.md) and 
[DML](./dml-of-carbondata.md) statements to support operations on it. Refer to 
the [Reference Manual](./language-manual.md) to understand the supported 
features and functions.
 
 **Programming Guides:** You can read our guides about [Java APIs 
supported](./sdk-guide.md) or [C++ APIs supported](./csdk-guide.md) to learn 
how to integrate CarbonData with your applications.
 
 
 
 ## Integration
 
-CarbonData can be integrated with popular Execution engines like 
[Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) 
and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and 
Configuration](./quick-start-guide.md#integration) section to understand all 
modes of Integrating CarbonData.
+CarbonData can be integrated with popular Execution engines like 
[Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) 
and [Hive](./quick-start-guide.md#hive). CarbonData also supports read and 
write with [Alluxio](./quick-start-guide.md#alluxio). Refer to the 
[Installation and Configuration](./quick-start-guide.md#integration) section to 
understand all modes of Integrating CarbonData.
--- End diff --

I think it's not proper to mention Alluxio after e(*Not E*)xecution engines 
like SparkSQL/Presto/Hive.

Meanwhile we can add another paragraph and mention CarbonData can integrate 
with other storage engines such as HDFS, S3, OBS, Alluxio.

@chenliang613 How do you think about it?


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246049322
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala
 ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
 /**
  * configure alluxio:
  * 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: 
https://www.alluxio.org/docs/1.8/en/compute/Spark.html
  */
-
 object AlluxioExample {
-  def main(args: Array[String]) {
-val spark = ExampleUtils.createCarbonSession("AlluxioExample")
-exampleBody(spark)
-spark.close()
+  def main (args: Array[String]) {
+val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+  storePath = "alluxio://localhost:19998/carbondata")
+exampleBody(carbon)
+carbon.close()
   }
 
-  def exampleBody(spark : SparkSession): Unit = {
+  def exampleBody (spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
 spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
 FileFactory.getConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
 
 // Specify date format based on raw data
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd")
 
-spark.sql("DROP TABLE IF EXISTS alluxio_table")
+val time = new SimpleDateFormat("MMddHHmmssSSS").format(new Date())
+
+val mFsShell = new FileSystemShell()
+val localFile = rootPath + "/hadoop/src/test/resources/data.csv"
+val remotePath = "/carbon_alluxio" + time + ".csv"
+val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + 
".csv"
--- End diff --

use 'prefix + remotePath' instead of concating the path by hand


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246050916
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala
 ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
 /**
  * configure alluxio:
  * 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: 
https://www.alluxio.org/docs/1.8/en/compute/Spark.html
  */
-
 object AlluxioExample {
-  def main(args: Array[String]) {
-val spark = ExampleUtils.createCarbonSession("AlluxioExample")
-exampleBody(spark)
-spark.close()
+  def main (args: Array[String]) {
+val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+  storePath = "alluxio://localhost:19998/carbondata")
+exampleBody(carbon)
+carbon.close()
   }
 
-  def exampleBody(spark : SparkSession): Unit = {
+  def exampleBody (spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
 spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
--- End diff --

Only providing an example for dataframe is not enough. Seems we should add 
some configurations in carbon property file and spark properties to make it 
work through beeline. So we can make it clear in case the user want to try it 
from beeline. 


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246044066
  
--- Diff: docs/alluxio-guide.md ---
@@ -0,0 +1,42 @@
+
+
+
+# Presto guide
--- End diff --

presto?


---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246047576
  
--- Diff: docs/quick-start-guide.md ---
@@ -54,7 +54,8 @@ CarbonData can be integrated with Spark,Presto and Hive 
Execution Engines. The b
 ### Hive
 [Installing and Configuring CarbonData on 
Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
 
-
+### Alluxio
--- End diff --

As mentioned above, we may need to adjust the location for this section.


---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-08 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
> because both the query and load flow were assigned the same taskId and 
once query finished it freed the unsafe memory while the insert still in 
progress.

How do you handle the scenario for stored-by-carbondata carbontable? In 
that scenario, both of the query flow and load flow use offheap and encounter 
the same problem just as you described above.
But I remembered we handle that differently from the current PR, which I 
think their modification can be similar. Please check this again.



---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2019-01-07 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
@jackylk Actually after applying the above commit, the size of the shade 
decrease from 40652 Bytes to 40620 Bytes


---

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2019-01-07 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r245633754
  
--- Diff: pom.xml ---
@@ -527,6 +526,7 @@
 examples/spark2
 datamap/lucene
 datamap/bloom
+datamap/example
--- End diff --

Excluding this will cause the datamap example module outdated and has 
potential unfixed bugs later, which is the previous status of this module.

Maybe we can execlude this from the assembly jar


---

[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3045
  
LGTM


---

[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245534858
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dm properties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

fine


---

[GitHub] carbondata issue #3046: [WIP] Added check to start fallback based on size

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
Hi @kunal642 ï¼in your PR, the threshold size for storage of the local 
dictionary is specified by system (maybe later can be specified by user). But 
it will come up with an obvious problem that how can the use know the exactly 
value.

I've read about Parquet that it will compare the dictionary encoded size 
with the original encoded size, only if the dictionary encoded size is smaller, 
will Parquet use it, otherwise it will fall back.

So can the current implementation suite this scenario well?



---

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245510146
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/DecoderBasedFallbackEncoder.java
 ---
@@ -57,10 +57,7 @@ public DecoderBasedFallbackEncoder(EncodedColumnPage 
encodedColumnPage, int page
 int pageSize =
 encodedColumnPage.getActualPage().getPageSize();
 int offset = 0;
-int[] reverseInvertedIndex = new int[pageSize];
--- End diff --

What does these changes forï¼


---

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245510098
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -2076,4 +2076,15 @@ private CarbonCommonConstants() {
*/
   public static final String 
CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512";
 
+  public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD =
--- End diff --

I think we should optimize this variable name.
The first time I saw this I thought it was duplicated with another 
threshold for local dictionary. One is number based, another is storage size 
based. Please take care of the readability.


---

[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245509909
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

emm, actually this line of code was added to solve the same problem 
mentioned in the PR description. Previously we thought it's user's 
responsibility to explicitly specify the long_string_columns in the preagg 
datamap.

Please @kevinjmh also check this.


---

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Include bloom...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245509625
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -184,6 +184,9 @@ object CarbonEnv {
   .addListener(classOf[LoadTablePostExecutionEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableCompactionPostEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableMergeIndexEvent], new 
MergeIndexEventListener)
+  .addListener(classOf[LoadTablePreStatusUpdateEvent], new 
MergeBloomIndexEventListener)
--- End diff --

yeah, that is supposed to be.

Besides, I think it's framework's responsibility to include this procedure 
in the data-loading transaction. So the best practice is to optimize the 
framework's behavior.

@jackylk How do you think about this. Should we do this in this PR or later?


---

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Include bloom...

2019-01-06 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245509427
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,96 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+LOGGER.info("LoadTablePreStatusUpdateEvent called for bloom index 
merging")
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+LOGGER.info("AlterTableCompactionPreStatusUpdateEvent called for 
bloom index merging")
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
--- End diff --

fine, just keep it as it is


---

[GitHub] carbondata issue #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom index be...

2019-01-04 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3023
  
Besides, I think the title of the PR can be optimized to 'Include the 
merging bloomindex procedure in data loading transaction' -- just for your 
reference


---

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2019-01-04 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245469143
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,96 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+LOGGER.info("LoadTablePreStatusUpdateEvent called for bloom index 
merging")
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+LOGGER.info("AlterTableCompactionPreStatusUpdateEvent called for 
bloom index merging")
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
--- End diff --

seems the following code block is duplicated with line#44~#54, please 
consider to optimize that.


---

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2019-01-04 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245469279
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeBloomIndexEventListener.scala
 ---
@@ -24,59 +24,96 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 
 import org.apache.carbondata.common.logging.LogServiceFactory
-import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datamap.{DataMapStoreManager, 
TableDataMap}
 import 
org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.datamap.CarbonMergeBloomIndexFilesRDD
 import org.apache.carbondata.events._
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePreStatusUpdateEvent
+import org.apache.carbondata.processing.merger.CarbonDataMergerUtil
 
 class MergeBloomIndexEventListener extends OperationEventListener with 
Logging {
   val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
   override def onEvent(event: Event, operationContext: OperationContext): 
Unit = {
+val sparkSession = SparkSession.getActiveSession.get
 event match {
+  case loadPreStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
+LOGGER.info("LoadTablePreStatusUpdateEvent called for bloom index 
merging")
+// For loading process, segment can not be accessed at this time
+val loadModel = loadPreStatusUpdateEvent.getCarbonLoadModel
+val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable
+val segmentId = loadModel.getSegmentId
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
+  case compactPreStatusUpdateEvent: 
AlterTableCompactionPreStatusUpdateEvent =>
+LOGGER.info("AlterTableCompactionPreStatusUpdateEvent called for 
bloom index merging")
+// For compact process, segment can not be accessed at this time
+val carbonTable = compactPreStatusUpdateEvent.carbonTable
+val mergedLoadName = compactPreStatusUpdateEvent.mergedLoadName
+val segmentId = 
CarbonDataMergerUtil.getLoadNumberFromLoadName(mergedLoadName)
+
+// filter out bloom datamap, skip lazy datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
+DataMapClassProvider.BLOOMFILTER.getShortName))
+  .filter(!_.getDataMapSchema.isLazy).toList
+
+mergeBloomIndex(sparkSession, carbonTable, bloomDatamaps, 
Seq(segmentId))
+
   case datamapPostEvent: BuildDataMapPostExecutionEvent =>
-LOGGER.info("Load post status event-listener called for merge 
bloom index")
+LOGGER.info("BuildDataMapPostExecutionEvent called for bloom index 
merging")
+// For rebuild datamap process, datamap is disabled when rebuilding
+if (!datamapPostEvent.isFromRebuild || null == 
datamapPostEvent.dmName) {
+  // ignore datamapPostEvent from loading and compaction for bloom 
index merging
+  // they use LoadTablePreStatusUpdateEvent and 
AlterTableCompactionPreStatusUpdateEvent
+  LOGGER.info("Ignore BuildDataMapPostExecutionEvent from loading 
and compaction")
+  return
+}
+
 val carbonTableIdentifier = datamapPostEvent.identifier
 val carbonTable = 
DataMapStoreManager.getInstance().getCarbonTable(carbonTableIdentifier)
-val tableDataMaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable)
-val sparkSession = SparkSession.getActiveSession.get
 
-// filter out bloom datamap
-var bloomDatamaps = tableDataMaps.asScala.filter(
-  _.getDataMapSchema.getProviderName.equalsIgnoreCase(
+// filter out current rebuilt bloom datamap
+val bloomDatamaps = 
DataMapStoreManager.getInstance().getAllDataMap(carbonTable).asScala
+  .filter(_.getDataMapSchema.getProviderName.equalsIgnoreCase(
 DataMapClassProvider.BLOOMFILTER.getShortName))
-
-if (datamapPostEvent.isFromRebuild) {
-  if

[GitHub] carbondata pull request #3023: [CARBONDATA-3197][BloomDataMap] Merge bloom i...

2019-01-04 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3023#discussion_r245469201
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -184,6 +184,9 @@ object CarbonEnv {
   .addListener(classOf[LoadTablePostExecutionEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableCompactionPostEvent], new 
MergeIndexEventListener)
   .addListener(classOf[AlterTableMergeIndexEvent], new 
MergeIndexEventListener)
+  .addListener(classOf[LoadTablePreStatusUpdateEvent], new 
MergeBloomIndexEventListener)
--- End diff --

After adding this line, the segment will not be visible to user until the 
mergeBloomIndex procedure finish, right?


---

[GitHub] carbondata pull request #3031: [CARBONDATA-3212] Fixed NegativeArraySizeExce...

2019-01-01 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3031#discussion_r244639643
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
 ---
@@ -140,6 +140,7 @@ public boolean isLocalDictGeneratedPage() {
 } else {
   actualDataColumnPage.putBytes(rowId, bytes);
 }
+pageSize = rowId + 1;
--- End diff --

So we need to update the pageSize each time we put a row to the page? It is 
so waste of calculation.
Please reconsider the implementation.


---

[GitHub] carbondata pull request #3036: [CARBONDATA-3208] Remove unused parameters, i...

2018-12-30 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3036#discussion_r244535093
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentProperties.java
 ---
@@ -287,31 +287,31 @@ private void 
fillDimensionAndMeasureDetails(List columnsInTable,
   // if it is a columnar dimension participated in mdkey then added
   // key ordinal and dimension ordinal
   carbonDimension =
-  new CarbonDimension(columnSchema, dimensonOrdinal++, 
keyOrdinal++, -1);
+  new CarbonDimension(columnSchema, dimensionOrdinal++, 
keyOrdinal++, -1);
 }
 // as complex type will be stored at last so once complex type 
started all the dimension
 // will be added to complex type
 else if (isComplexDimensionStarted || 
columnSchema.getDataType().isComplexType()) {
   cardinalityIndexForComplexDimensionColumn.add(tableOrdinal);
   carbonDimension =
-  new CarbonDimension(columnSchema, dimensonOrdinal++, -1, 
++complexTypeOrdinal);
+  new CarbonDimension(columnSchema, dimensionOrdinal++, -1, 
++complexTypeOrdinal);
   
carbonDimension.initializeChildDimensionsList(columnSchema.getNumberOfChild());
   complexDimensions.add(carbonDimension);
   isComplexDimensionStarted = true;
-  int previouseOrdinal = dimensonOrdinal;
-  dimensonOrdinal =
-  readAllComplexTypeChildren(dimensonOrdinal, 
columnSchema.getNumberOfChild(),
+  int previouseOrdinal = dimensionOrdinal;
--- End diff --

um, part of the variable name 'previouse' is also typo, which should be 
'previous'...
Please correct this also and others LGTM.


---

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-28 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r244344781
  
--- Diff: 
datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/MinMaxDataMapFactory.java
 ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import 
org.apache.carbondata.common.exceptions.sql.MalformedDataMapCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.DataMapMeta;
+import org.apache.carbondata.core.datamap.DataMapStoreManager;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.TableDataMap;
+import org.apache.carbondata.core.datamap.dev.DataMapBuilder;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import 
org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMapFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.DataMapSchema;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.events.Event;
+
+import org.apache.log4j.Logger;
+
+/**
+ * Min Max DataMap Factory
+ */
+@InterfaceAudience.Internal
+public class MinMaxDataMapFactory extends CoarseGrainDataMapFactory {
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(MinMaxDataMapFactory.class.getName());
+  private DataMapMeta dataMapMeta;
+  private String dataMapName;
+  // segmentId -> list of index files
+  private Map> segmentMap = new ConcurrentHashMap<>();
+  private Cache cache;
+
+  public MinMaxDataMapFactory(CarbonTable carbonTable, DataMapSchema 
dataMapSchema)
+  throws MalformedDataMapCommandException {
+super(carbonTable, dataMapSchema);
+
+// this is an example for datamap, we can choose the columns and 
operations that
+// will be supported by this datamap. Furthermore, we can add 
cache-support for this datamap.
+
+this.dataMapName = dataMapSchema.getDataMapName();
+List indexedColumns = 
carbonTable.getIndexedColumns(dataMapSchema);
+
+// operations that will be supported on the indexed columns
+List optOperations = new ArrayList<>();
+optOperations.add(ExpressionType.NOT);
+optOperations.add(ExpressionType.EQUALS);
+optOperations.add(ExpressionType.NOT_EQUALS);
+optOperations.add(ExpressionType.GREATERTHAN);
+optOperations.add(ExpressionType.GREATERTHAN_EQUALTO);
+opt

[GitHub] carbondata pull request #2970: [CARBONDATA-3142]Add timestamp with thread na...

2018-12-26 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2970#discussion_r244015783
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonThreadFactory.java ---
@@ -34,14 +34,26 @@
*/
   private String name;
 
+  private boolean withTime = false;
+
   public CarbonThreadFactory(String name) {
 this.defaultFactory = Executors.defaultThreadFactory();
 this.name = name;
   }
 
+  public CarbonThreadFactory(String name, boolean withTime) {
+this(name);
+this.withTime = withTime;
+  }
+
   @Override public Thread newThread(Runnable r) {
 final Thread thread = defaultFactory.newThread(r);
-thread.setName(name);
+if (withTime) {
+  thread.setName(name + "_" + System.currentTimeMillis());
--- End diff --

Why not use nanotime, since the caller below used nanotime previously?


---

[jira] [Resolved] (CARBONDATA-3181) IllegalAccessError for BloomFilter.bits when bloom_compress is false

2018-12-20 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3181.

   Resolution: Fixed
Fix Version/s: 1.5.2

> IllegalAccessError for BloomFilter.bits when bloom_compress is false
> 
>
> Key: CARBONDATA-3181
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3181
> Project: CarbonData
>  Issue Type: Bug
>Reporter: jiangmanhua
>Priority: Major
> Fix For: 1.5.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ```
> 18/12/19 11:16:07 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> java.lang.IllegalAccessError: tried to access field 
> org.apache.hadoop.util.bloom.BloomFilter.bits from class 
> org.apache.hadoop.util.bloom.CarbonBloomFilter
>  at 
> org.apache.hadoop.util.bloom.CarbonBloomFilter.membershipTest(CarbonBloomFilter.java:70)
>  at 
> org.apache.carbondata.datamap.bloom.BloomCoarseGrainDataMap.prune(BloomCoarseGrainDataMap.java:202)
>  at 
> org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:185)
>  at 
> org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:160)
>  at 
> org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53)
>  at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:517)
>  at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:412)
>  at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:529)
>  at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:220)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:127)
>  at 
> org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #3000: [CARBONDATA-3181][BloomDataMap] Fix access field err...

2018-12-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3000
  
LGTM


---

[GitHub] carbondata issue #2999: [HOTFIX] replace apache common log with carbondata l...

2018-12-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2999
  
LGTM


---

[GitHub] carbondata pull request #2988: [CARBONDATA-3174] Fix trailing space issue wi...

2018-12-17 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2988#discussion_r242377558
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
 ---
@@ -2490,6 +2490,54 @@ class TestNonTransactionalCarbonTable extends 
QueryTest with BeforeAndAfterAll {
 FileUtils.deleteDirectory(new File(writerPath))
   }
 
+  test("check varchar with trailing space") {
--- End diff --

besides, this is for varchar columns, why not update the code there?


---

[GitHub] carbondata issue #2992: [CARBONDATA-3176] Optimize quick-start-guide documen...

2018-12-17 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2992
  
@xubo245 
"and plan to support alluxio path too."
---
I think there is no need to add this currently. We should only describe the 
feature implemented.


---

[GitHub] carbondata issue #2992: [CARBONDATA-3176] Optimize quick-start-guide documen...

2018-12-16 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2992
  
LGTM


---

[jira] [Resolved] (CARBONDATA-3166) Changes in Document and Displaying Carbon Column Compressor used in Describe Formatted Command

2018-12-14 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3166.

   Resolution: Fixed
Fix Version/s: 1.5.2

> Changes in Document and Displaying Carbon Column Compressor used in Describe 
> Formatted Command
> --
>
> Key: CARBONDATA-3166
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3166
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Shardul Singh
>Assignee: Shardul Singh
>Priority: Minor
> Fix For: 1.5.2
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Changes in Document and Displaying Carbon Column Compressor used in Describe 
> Formatted Command



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2986: [CARBONDATA-3166]Updated Document and added Column C...

2018-12-14 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2986
  
LGTM


---

[GitHub] carbondata pull request #2984: [CARBONDATA-3165]Protection of Bloom Null Exc...

2018-12-13 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2984#discussion_r241616785
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -227,6 +231,12 @@ private String getAncestorTablePath(CarbonTable 
currentTable) {
 }
   }
 }
+if (hitBlocklets == null) {
+  LOGGER.warn(String.format("HitBlocklets is empty in bloom filter 
prune method. " +
--- End diff --

Is this a potential problem? If not, why output 'warn' message?


---

[GitHub] carbondata issue #2986: [CARBONDATA-3166]Updated Document and added Column C...

2018-12-13 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2986
  
I am with @KanakaKumar 's comments. We'd better not to show the system 
default value in the desc command for columncompressor, since the values can 
vary each time we changed that value. Maybe we can search advise for the other 
guys


---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2018-12-12 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
retest this please


---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
retest this please


---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
retest this please


---

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r240581684
  
--- Diff: 
integration/spark2/src/test/scala/org/apache/carbondata/datamap/minmax/MinMaxDataMapFunctionSuite.scala
 ---
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class MinMaxDataMapFunctionSuite extends QueryTest with BeforeAndAfterAll {
+  private val minmaxDataMapFactoryName = 
"org.apache.carbondata.datamap.minmax.MinMaxDataMapFactory"
+  var originalStatEnabled = CarbonProperties.getInstance().getProperty(
+CarbonCommonConstants.ENABLE_QUERY_STATISTICS,
+CarbonCommonConstants.ENABLE_QUERY_STATISTICS_DEFAULT)
+
+  override protected def beforeAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT,
+  "-MM-dd")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+  "-MM-dd HH:mm:ss")
--- End diff --

I think this modification is OK.
We explicitly specify the format here to indicate that this is just the 
format of our input data. (I'm afraid the default behavior will change later)


---

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r240580992
  
--- Diff: 
datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/MinMaxDataMapFactory.java
 ---
@@ -0,0 +1,365 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import 
org.apache.carbondata.common.exceptions.sql.MalformedDataMapCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.datamap.DataMapDistributable;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.DataMapMeta;
+import org.apache.carbondata.core.datamap.DataMapStoreManager;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.TableDataMap;
+import org.apache.carbondata.core.datamap.dev.DataMapBuilder;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
+import 
org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMapFactory;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.features.TableOperation;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.DataMapSchema;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.events.Event;
+
+import org.apache.log4j.Logger;
+
+/**
+ * Min Max DataMap Factory
+ */
+@InterfaceAudience.Internal
+public class MinMaxDataMapFactory extends CoarseGrainDataMapFactory {
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(MinMaxDataMapFactory.class.getName());
+  private DataMapMeta dataMapMeta;
+  private String dataMapName;
+  // segmentId -> list of index files
+  private Map> segmentMap = new ConcurrentHashMap<>();
+  private Cache cache;
+
+  public MinMaxDataMapFactory(CarbonTable carbonTable, DataMapSchema 
dataMapSchema)
+  throws MalformedDataMapCommandException {
+super(carbonTable, dataMapSchema);
+
+// this is an example for datamap, we can choose the columns and 
operations that
+// will be supported by this datamap. Furthermore, we can add 
cache-support for this datamap.
+
+this.dataMapName = dataMapSchema.getDataMapName();
+List indexedColumns = 
carbonTable.getIndexedColumns(dataMapSchema);
+
+// operations that will be supported on the indexed columns
+List optOperations = new ArrayList<>();
+optOperations.add(ExpressionType.NOT);
+optOperations.add(ExpressionType.EQUALS);
+optOperations.add(ExpressionType.NOT_EQUALS);
+optOperations.add(ExpressionType.GREATERTHAN);
+optOperations.add(ExpressionType.GREATERTHAN_EQUALTO);
+opt

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r240579947
  
--- Diff: 
datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/AbstractMinMaxDataMapWriter.java
 ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.datastore.page.encoding.bool.BooleanConvert;
+import 
org.apache.carbondata.core.datastore.page.statistics.ColumnPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.KeyPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.PrimitivePageStatsCollector;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.encoder.Encoding;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ * We will record the min & max value for each index column in each 
blocklet.
+ * Since the size of index is quite small, we will combine the index for 
all index columns
+ * in one file.
+ */
+public abstract class AbstractMinMaxDataMapWriter extends DataMapWriter {
+  private static final Logger LOGGER = LogServiceFactory.getLogService(
+  AbstractMinMaxDataMapWriter.class.getName());
+
+  private ColumnPageStatsCollector[] indexColumnMinMaxCollectors;
+  protected int currentBlockletId;
+  private String currentIndexFile;
+  private DataOutputStream currentIndexFileOutStream;
+
+  public AbstractMinMaxDataMapWriter(String tablePath, String dataMapName,
+  List indexColumns, Segment segment, String shardName) 
throws IOException {
+super(tablePath, dataMapName, indexColumns, segment, shardName);
+initStatsCollector();
+initDataMapFile();
+  }
+
+  private void initStatsCollector() {
+indexColumnMinMaxCollectors = new 
ColumnPageStatsCollector[indexColumns.size()];
+CarbonColumn indexCol;
+for (int i = 0; i < indexColumns.size(); i++) {
+  indexCol = indexColumns.get(i);
+  if (indexCol.isMeasure()
+  || (indexCol.isDimension()
+  && DataTypeUtil.isPrimitiveColumn(indexCol.getDataType())
+  && !indexCol.hasEncoding(Encoding.DICTIONARY)
+  && !indexCol.hasEncoding(Encoding.DIRECT_DICTIONARY))) {
+indexColumnMinMaxCollectors[i] = 
PrimitivePageStatsCollector.newInstance(
+indexColumns.get(i).getDataType());
+  } else {
+indexColumnMinMaxCollectors[i] = 
KeyPageStatsCollector.newInstance(DataTypes.BYTE_ARRAY);
+  }
+}
+  }
+
+  private void initDataMapFile() throws IOException {
+if (!FileFactory.isFileExist(dataMapPath) &&
+!FileFactory.mkdirs(dataMapPath, 
FileFactory.getFileType(dataMapPath))) {
+  throw new IOException("Failed to create directory " + dataMapPath);
+}
+
+try {
+  currentIndexFile = MinMaxIndexDataMap.getIndexFile(dataMapPath,
+  MinMaxIndexHolder.MINMAX_INDEX_PREFFIX + indexColumns.size());
+  FileFactory.createNewFile(currentIndexFile, 
FileFactory

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r240579236
  
--- Diff: 
datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/AbstractMinMaxDataMapWriter.java
 ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.datastore.page.encoding.bool.BooleanConvert;
+import 
org.apache.carbondata.core.datastore.page.statistics.ColumnPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.KeyPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.PrimitivePageStatsCollector;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.encoder.Encoding;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ * We will record the min & max value for each index column in each 
blocklet.
+ * Since the size of index is quite small, we will combine the index for 
all index columns
+ * in one file.
+ */
+public abstract class AbstractMinMaxDataMapWriter extends DataMapWriter {
+  private static final Logger LOGGER = LogServiceFactory.getLogService(
+  AbstractMinMaxDataMapWriter.class.getName());
+
+  private ColumnPageStatsCollector[] indexColumnMinMaxCollectors;
+  protected int currentBlockletId;
+  private String currentIndexFile;
+  private DataOutputStream currentIndexFileOutStream;
+
+  public AbstractMinMaxDataMapWriter(String tablePath, String dataMapName,
+  List indexColumns, Segment segment, String shardName) 
throws IOException {
+super(tablePath, dataMapName, indexColumns, segment, shardName);
+initStatsCollector();
+initDataMapFile();
+  }
+
+  private void initStatsCollector() {
+indexColumnMinMaxCollectors = new 
ColumnPageStatsCollector[indexColumns.size()];
+CarbonColumn indexCol;
+for (int i = 0; i < indexColumns.size(); i++) {
+  indexCol = indexColumns.get(i);
+  if (indexCol.isMeasure()
+  || (indexCol.isDimension()
+  && DataTypeUtil.isPrimitiveColumn(indexCol.getDataType())
+  && !indexCol.hasEncoding(Encoding.DICTIONARY)
+  && !indexCol.hasEncoding(Encoding.DIRECT_DICTIONARY))) {
+indexColumnMinMaxCollectors[i] = 
PrimitivePageStatsCollector.newInstance(
+indexColumns.get(i).getDataType());
+  } else {
+indexColumnMinMaxCollectors[i] = 
KeyPageStatsCollector.newInstance(DataTypes.BYTE_ARRAY);
+  }
+}
+  }
+
+  private void initDataMapFile() throws IOException {
+if (!FileFactory.isFileExist(dataMapPath) &&
+!FileFactory.mkdirs(dataMapPath, 
FileFactory.getFileType(dataMapPath))) {
+  throw new IOException("Failed to create directory " + dataMapPath);
+}
+
+try {
+  currentIndexFile = MinMaxIndexDataMap.getIndexFile(dataMapPath,
+  MinMaxIndexHolder.MINMAX_INDEX_PREFFIX + indexColumns.size());
+  FileFactory.createNewFile(currentIndexFile, 
FileFactory

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2963#discussion_r240578382
  
--- Diff: 
datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/AbstractMinMaxDataMapWriter.java
 ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.datamap.minmax;
+
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.DataMapWriter;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.datastore.page.encoding.bool.BooleanConvert;
+import 
org.apache.carbondata.core.datastore.page.statistics.ColumnPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.KeyPageStatsCollector;
+import 
org.apache.carbondata.core.datastore.page.statistics.PrimitivePageStatsCollector;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.encoder.Encoding;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+
+import org.apache.log4j.Logger;
+
+/**
+ * We will record the min & max value for each index column in each 
blocklet.
+ * Since the size of index is quite small, we will combine the index for 
all index columns
+ * in one file.
+ */
+public abstract class AbstractMinMaxDataMapWriter extends DataMapWriter {
+  private static final Logger LOGGER = LogServiceFactory.getLogService(
+  AbstractMinMaxDataMapWriter.class.getName());
+
+  private ColumnPageStatsCollector[] indexColumnMinMaxCollectors;
+  protected int currentBlockletId;
+  private String currentIndexFile;
+  private DataOutputStream currentIndexFileOutStream;
+
+  public AbstractMinMaxDataMapWriter(String tablePath, String dataMapName,
+  List indexColumns, Segment segment, String shardName) 
throws IOException {
+super(tablePath, dataMapName, indexColumns, segment, shardName);
+initStatsCollector();
+initDataMapFile();
+  }
+
+  private void initStatsCollector() {
+indexColumnMinMaxCollectors = new 
ColumnPageStatsCollector[indexColumns.size()];
+CarbonColumn indexCol;
+for (int i = 0; i < indexColumns.size(); i++) {
+  indexCol = indexColumns.get(i);
+  if (indexCol.isMeasure()
+  || (indexCol.isDimension()
+  && DataTypeUtil.isPrimitiveColumn(indexCol.getDataType())
+  && !indexCol.hasEncoding(Encoding.DICTIONARY)
+  && !indexCol.hasEncoding(Encoding.DIRECT_DICTIONARY))) {
+indexColumnMinMaxCollectors[i] = 
PrimitivePageStatsCollector.newInstance(
+indexColumns.get(i).getDataType());
+  } else {
+indexColumnMinMaxCollectors[i] = 
KeyPageStatsCollector.newInstance(DataTypes.BYTE_ARRAY);
+  }
+}
+  }
+
+  private void initDataMapFile() throws IOException {
+if (!FileFactory.isFileExist(dataMapPath) &&
+!FileFactory.mkdirs(dataMapPath, 
FileFactory.getFileType(dataMapPath))) {
+  throw new IOException("Failed to create directory " + dataMapPath);
+}
+
+try {
+  currentIndexFile = MinMaxIndexDataMap.getIndexFile(dataMapPath,
+  MinMaxIndexHolder.MINMAX_INDEX_PREFFIX + indexColumns.size());
+  FileFactory.createNewFile(currentIndexFile, 
FileFactory

[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception

2018-12-11 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2969
  
LGTM


---

[GitHub] carbondata issue #2732: [CARBONDATA-3020] support lz4 as column compressor

2018-12-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2732
  
better to have this PR tested using more data and queries to check whether 
it has some advantages than other compressors.


---

[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2969
  
@SteNicholas Nice to see your work on the existed problems. And it seems 
the previous code has some problem which is extend by your code. So I suggest 
you to fix them at the same time. Please check the above comments.


---

[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2969#discussion_r239070652
  
--- Diff: 
integration/hive/src/test/java/org/apache/carbondata/hive/TestCarbonSerDe.java 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import junit.framework.TestCase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.io.DoubleWritable;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.ShortWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.*;
+import org.junit.Test;
+
+import java.util.Properties;
+
+public class TestCarbonSerDe extends TestCase {
+@Test
+public void testCarbonHiveSerDe() throws Throwable {
+try {
+// Create the SerDe
+System.out.println("test: testCarbonHiveSerDe");
+
+final CarbonHiveSerDe serDe = new CarbonHiveSerDe();
+final Configuration conf = new Configuration();
+final Properties tbl = createProperties();
+SerDeUtils.initializeSerDe(serDe, conf, tbl, null);
+
+// Data
+final Writable[] arr = new Writable[7];
+
+//primitive types
+arr[0] = new ShortWritable((short) 456);
+arr[1] = new IntWritable(789);
+arr[2] = new LongWritable(1000l);
+arr[3] = new DoubleWritable(5.3);
+arr[4] = new HiveDecimalWritable(HiveDecimal.create(1));
+arr[5] = new Text("CarbonSerDe Binary".getBytes("UTF-8"));
+
+final Writable[] arrayContainer = new Writable[1];
+final Writable[] array = new Writable[5];
+for (int i = 0; i < 5; ++i) {
+array[i] = new IntWritable(i);
+}
+arrayContainer[0] = new ArrayWritable(Writable.class, array);
+arr[6] = new ArrayWritable(Writable.class, arrayContainer);
+
+final ArrayWritable arrWritable = new 
ArrayWritable(Writable.class, arr);
+// Test
+deserializeAndSerializeLazySimple(serDe, arrWritable);
+System.out.println("test: testCarbonHiveSerDe - OK");
+
+} catch (final Throwable e) {
+e.printStackTrace();
--- End diff --

use Logger instead of printing stacktarce in test code


---

[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2969#discussion_r239070613
  
--- Diff: 
integration/hive/src/test/java/org/apache/carbondata/hive/TestCarbonSerDe.java 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import junit.framework.TestCase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.io.DoubleWritable;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.ShortWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.*;
+import org.junit.Test;
+
+import java.util.Properties;
+
+public class TestCarbonSerDe extends TestCase {
+@Test
+public void testCarbonHiveSerDe() throws Throwable {
+try {
+// Create the SerDe
+System.out.println("test: testCarbonHiveSerDe");
+
+final CarbonHiveSerDe serDe = new CarbonHiveSerDe();
+final Configuration conf = new Configuration();
+final Properties tbl = createProperties();
+SerDeUtils.initializeSerDe(serDe, conf, tbl, null);
+
+// Data
+final Writable[] arr = new Writable[7];
+
+//primitive types
+arr[0] = new ShortWritable((short) 456);
+arr[1] = new IntWritable(789);
+arr[2] = new LongWritable(1000l);
+arr[3] = new DoubleWritable(5.3);
+arr[4] = new HiveDecimalWritable(HiveDecimal.create(1));
+arr[5] = new Text("CarbonSerDe Binary".getBytes("UTF-8"));
+
+final Writable[] arrayContainer = new Writable[1];
+final Writable[] array = new Writable[5];
+for (int i = 0; i < 5; ++i) {
+array[i] = new IntWritable(i);
+}
+arrayContainer[0] = new ArrayWritable(Writable.class, array);
+arr[6] = new ArrayWritable(Writable.class, arrayContainer);
+
+final ArrayWritable arrWritable = new 
ArrayWritable(Writable.class, arr);
+// Test
+deserializeAndSerializeLazySimple(serDe, arrWritable);
+System.out.println("test: testCarbonHiveSerDe - OK");
+
+} catch (final Throwable e) {
+e.printStackTrace();
--- End diff --

use Logger instead of printing stacktarce in test code


---

[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2969#discussion_r239071026
  
--- Diff: 
integration/hive/src/test/java/org/apache/carbondata/hive/TestCarbonSerDe.java 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import junit.framework.TestCase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.io.DoubleWritable;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.ShortWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.*;
+import org.junit.Test;
+
+import java.util.Properties;
+
+public class TestCarbonSerDe extends TestCase {
+@Test
+public void testCarbonHiveSerDe() throws Throwable {
+try {
+// Create the SerDe
+System.out.println("test: testCarbonHiveSerDe");
+
+final CarbonHiveSerDe serDe = new CarbonHiveSerDe();
+final Configuration conf = new Configuration();
+final Properties tbl = createProperties();
+SerDeUtils.initializeSerDe(serDe, conf, tbl, null);
+
+// Data
+final Writable[] arr = new Writable[7];
+
+//primitive types
+arr[0] = new ShortWritable((short) 456);
+arr[1] = new IntWritable(789);
+arr[2] = new LongWritable(1000l);
+arr[3] = new DoubleWritable(5.3);
+arr[4] = new HiveDecimalWritable(HiveDecimal.create(1));
+arr[5] = new Text("CarbonSerDe Binary".getBytes("UTF-8"));
+
+final Writable[] arrayContainer = new Writable[1];
+final Writable[] array = new Writable[5];
+for (int i = 0; i < 5; ++i) {
+array[i] = new IntWritable(i);
+}
+arrayContainer[0] = new ArrayWritable(Writable.class, array);
+arr[6] = new ArrayWritable(Writable.class, arrayContainer);
+
+final ArrayWritable arrWritable = new 
ArrayWritable(Writable.class, arr);
+// Test
+deserializeAndSerializeLazySimple(serDe, arrWritable);
+System.out.println("test: testCarbonHiveSerDe - OK");
+
+} catch (final Throwable e) {
--- End diff --

after observing the following procedure, I think there is no need to 
catch-rethrow the exception here. The test framework will handle this


---

[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2969#discussion_r239070238
  
--- Diff: 
integration/hive/src/test/java/org/apache/carbondata/hive/TestCarbonSerDe.java 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import junit.framework.TestCase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.io.DoubleWritable;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.ShortWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.*;
+import org.junit.Test;
+
+import java.util.Properties;
+
+public class TestCarbonSerDe extends TestCase {
+@Test
+public void testCarbonHiveSerDe() throws Throwable {
+try {
+// Create the SerDe
+System.out.println("test: testCarbonHiveSerDe");
--- End diff --

It's not recommended to use stdout in test code. Currently we only use 
stdout in `example` code not `test` code. If you want to print something, use 
carbon Logger instead


---

[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...

2018-12-05 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2969#discussion_r239069684
  
--- Diff: 
integration/hive/src/test/java/org/apache/carbondata/hive/TestCarbonSerDe.java 
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.hive;
+
+import junit.framework.TestCase;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.io.DoubleWritable;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.io.ShortWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.*;
+import org.junit.Test;
+
+import java.util.Properties;
+
+public class TestCarbonSerDe extends TestCase {
--- End diff --

Please refer to the other tests in carbon such as carbon-core module. 
Actually we do not need to extend TestCase here


---

[GitHub] carbondata issue #2878: [CARBONDATA-3107] Optimize error/exception coding fo...

2018-12-02 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2878
  
LGTM
Please fix the conflicts


---

[GitHub] carbondata issue #2961: [CARBONDATA-3119] Fixing the getOrCreateCarbonSessio...

2018-11-30 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
LGTM


---

[GitHub] carbondata issue #2961: [CARBONDATA-3119] Fixing the getOrCreateCarbonSessio...

2018-11-30 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
LGTM


---

[GitHub] carbondata issue #2961: [CARBONDATA-3119] Fixing the getOrCreateCarbonSessio...

2018-11-30 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
LGTM


---

[GitHub] carbondata issue #2914: [CARBONDATA-3093] Provide property builder for carbo...

2018-11-30 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2914
  
Have you rebased with the latest master code and recheck again? Since 18 
days had passed after your last commit.


---

[GitHub] carbondata issue #2961: [CARBONDATA-3119] Fixing the getOrCreateCarbonSessio...

2018-11-29 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2961
  
+1 for @zzcclp 's comments


---

[GitHub] carbondata issue #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap example

2018-11-28 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2963
  
> Can consider writing an example:how to use MinMaxDataMap to build index 
for CSV file.

@chenliang613 This will requires carbondata support external file format 
(such as CSV) in loading and reading. Then we can simply use this minmax 
datamap as well as bloomfilter datamap as filelevel datamap for these formats.


---

[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap

2018-11-28 Thread xuchuanyin

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2963

[CARBONDATA-3139] Fix bugs in MinMaxDataMap

make minmax datamap usable and add more tests for it.
MinMax DataMap may be useful if we want to implement datamap for external 
file format like CSV/Parquet etc.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 181125_bug_minmax_dm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2963


commit cad25a9a8994a19d306ed3c53b7fd0aaf58a5811
Author: xuchuanyin 
Date:   2018-11-25T13:36:17Z

Fix bugs in MinMaxDataMap

make minmax datamap usable and add more tests for it




---

[jira] [Created] (CARBONDATA-3139) Fix bugs in datamap example

2018-11-28 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-3139:
--

 Summary: Fix bugs in datamap example
 Key: CARBONDATA-3139
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3139
 Project: CarbonData
  Issue Type: Bug
Reporter: xuchuanyin
Assignee: xuchuanyin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-3133) Update carbondata build document

2018-11-27 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3133.

Resolution: Fixed

> Update carbondata build document
> 
>
> Key: CARBONDATA-3133
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3133
> Project: CarbonData
>  Issue Type: Improvement
>  Components: build
>Affects Versions: NONE
>Reporter: Jonathan.Wei
>Assignee: Jonathan.Wei
>Priority: Major
> Fix For: 1.5.2
>
>   Original Estimate: 1h
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Update the document to add spark 2.3.2 and add datamap mv compiling method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2955: [CARBONDATA-3133] Update the document to add spark 2...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2955
  
Merged. Thanks for  your contribution ð 


---

[GitHub] carbondata issue #2955: [CARBONDATA-3133] Update the document to add spark 2...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2955
  
LGTM


---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2949#discussion_r236907065
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -205,26 +195,53 @@ public BlockletDetailsFetcher 
getBlockletDetailsFetcher() {
   final FilterResolverIntf filterExp, final List 
partitions,
   List blocklets, final Map> 
dataMaps,
   int totalFiles) {
+/*
+ 
*
+ * Below is the example of how this part of code works.
+ * consider a scenario of having 5 segments, 10 datamaps in each 
segment,
--- End diff --

What do you mean by saying '10 datamaps in each segment'?
Do you mean '10 index files or merged index files or blocklet or something 
else?'


---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2949#discussion_r236907320
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -205,26 +195,53 @@ public BlockletDetailsFetcher 
getBlockletDetailsFetcher() {
   final FilterResolverIntf filterExp, final List 
partitions,
   List blocklets, final Map> 
dataMaps,
   int totalFiles) {
+/*
+ 
*
+ * Below is the example of how this part of code works.
+ * consider a scenario of having 5 segments, 10 datamaps in each 
segment,
--- End diff --

Also what does the 'record' mean below?


---

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2949#discussion_r236571984
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
*/
   void finish();
 
+  // can return , number of records information that are stored in datamap.
--- End diff --

"can return"?
What does this mean?


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2936#discussion_r236568719
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -487,6 +487,8 @@ private int getBlockCount(List 
blocklets) {
 // First prune using default datamap on driver side.
 TableDataMap defaultDataMap = 
DataMapStoreManager.getInstance().getDefaultDataMap(carbonTable);
 List prunedBlocklets = null;
+// This is to log the event, so user will know what is happening by 
seeing logs.
+LOG.info("Started block pruning ...");
--- End diff --

Instead of adding these logs, I think we'd better add the time consumed for 
pruning in statistics.


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2936#discussion_r236565153
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"false";
 
+  /**
+   * max driver threads used for block pruning [1 to 4 threads]
+   */
+  @CarbonProperty public static final String 
CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
+  "carbon.max.driver.threads.for.block.pruning";
--- End diff --

I think it's better to use the name
`carbon.query.pruning.parallelism.driver`


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2936#discussion_r236565449
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"false";
 
+  /**
+   * max driver threads used for block pruning [1 to 4 threads]
+   */
+  @CarbonProperty public static final String 
CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
+  "carbon.max.driver.threads.for.block.pruning";
+
+  public static final String 
CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4";
+
+  // block prune in multi-thread if files size more than 100K files.
+  public static final int 
CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT = 10;
--- End diff --

Why add this constraint?


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2936#discussion_r236564769
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -63,6 +75,8 @@
 
   private SegmentPropertiesFetcher segmentPropertiesFetcher;
 
+  private static final Log LOG = LogFactory.getLog(TableDataMap.class);
--- End diff --

We do not use apache-common-logs in carbondata project! Please take care of 
this


---

[GitHub] carbondata pull request #2955: [CARBONDATA-3133] update build document

2018-11-27 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2955#discussion_r236563139
  
--- Diff: build/README.md ---
@@ -29,10 +29,40 @@ Build with different supported versions of Spark, by 
default using Spark 2.2.1 t
 ```
 mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
 mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.1 clean package
+mvn -DskipTests -Pspark-2.3 -Dspark.version=2.3.2 clean package
 ```
 
 Note: If you are working in Windows environment, remember to add 
`-Pwindows` while building the project.
 
+## MV Feature Build
+Add mv module and sourceDirectory to the spark profile corresponding to 
the parent pom.xml file and recompile.
+The compile command is the same as the command in the previous section
+```
+
--- End diff --

Do we really need this?
Currently we can use `-Pmv` to include the MV feature while compiling.


---

[GitHub] carbondata issue #2943: [CARBONDATA-3120]Fixed the parent version error in M...

2018-11-23 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2943
  
This PR uses the version 1.5.2-snapshot, but the main pom uses 
1.6.0-snapshot.
Is it intended?


---

[GitHub] carbondata issue #2926: [HOTFIX] Reduce blocklet minimum configurable size

2018-11-16 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2926
  
Will it be a table property or just leave it as a system property?


---

[jira] [Resolved] (CARBONDATA-3031) Find wrong description in the document for 'carbon.number.of.cores.while.loading'

2018-11-16 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3031.

   Resolution: Fixed
Fix Version/s: 1.5.1

> Find wrong description in the document for 
> 'carbon.number.of.cores.while.loading'
> -
>
> Key: CARBONDATA-3031
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3031
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xuchuanyin
>Assignee: lianganping
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The document says that the default value of 
> ‘carbon.number.of.cores.while.loading’ is 2. But actually during data 
> loading, carbondata use the the value of 'spark.executor.cores', which means 
> that the description in document is incorrect.
> But this doesn't mean that the default value of 
> 'carbon.number.of.cores.while.loading' is useless -- in compaction and sdk, 
> carbondata still use this default value.
> In a word, we need to fix the implementation as well as the document, maybe 
> some refactoring is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2907: [CARBONDATA-3031] refining usage of numberofcores in...

2018-11-16 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2907
  
LGTM


---

[GitHub] carbondata issue #2920: [HOTFIX] Improve log message in CarbonWriterBuilder

2018-11-15 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2920
  
LGTM
Waiting for the builds


---

[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...

2018-11-15 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2920#discussion_r234098865
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -438,13 +438,13 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
 Objects.requireNonNull(path, "path should not be null");
 if (this.writerType == null) {
   throw new IOException(
-  "Writer type is not set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
+  "'writerType' must be set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
   + "API based on input");
 }
 if (this.writtenByApp == null || this.writtenByApp.isEmpty()) {
   throw new RuntimeException(
-  "AppName is not set, please use writtenBy() API to set the App 
Name"
-  + "which is using SDK");
+  "'writtenBy' must be set when writting carbon files, use 
writtenBy() API to "
--- End diff --

```suggestion
  "'writtenBy' must be set when writing carbon files, use 
writtenBy() API to "
```


---

[jira] [Resolved] (CARBONDATA-3087) Prettify DESC FORMATTED output

2018-11-15 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-3087.

   Resolution: Fixed
 Assignee: Jacky Li
Fix Version/s: 1.5.1

> Prettify DESC FORMATTED output
> --
>
> Key: CARBONDATA-3087
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3087
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Change output of DESC FORMATTED to:
> {noformat}
> ++-+---+
> |col_name|data_type   
>  |comment|
> ++-+---+
> |shortfield  |smallint
>  |null   |
> |intfield|int 
>  |null   |
> |bigintfield |bigint  
>  |null   |
> |doublefield |double  
>  |null   |
> |stringfield |string  
>  |null   |
> |timestampfield  |timestamp   
>  |null   |
> |decimalfield|decimal(18,2)   
>  |null   |
> |datefield   |date
>  |null   |
> |charfield   |string  
>  |null   |
> |floatfield  |double  
>  |null   |
> ||
>  |   |
> |## Table Basic Information  |
>  |   |
> |Comment |
>  |   |
> |Path
> |/Users/jacky/code/carbondata/examples/spark2/target/store/default/carbonsession_table|
>|
> |Table Block Size|1024 MB 
>  |   |
> |Table Blocklet Size |64 MB   
>  |   |
> |Streaming   |false   
>  |   |
> |Flat Folder |false   
>  |   |
> |Bad Record Path |
>  |   |
> |Min Input Per Node  |0.0B
>  |   |
> ||
>  |   |
> |## Index Information|
>  |   |
> |Sort Scope  |LOCAL_SORT  
>  |   |
> |Sort Columns|stringfield,timestampfield,datefield,charfield  
>  |   |
> |Index Cache Level   |BLOCK   
>  |   |
> |Cached Index Columns|All columns 
>  |   |
> ||
>  |   |
> |## Encoding Information |

[GitHub] carbondata issue #2908: [CARBONDATA-3087] Improve DESC FORMATTED output

2018-11-15 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2908
  
LGTM


---

[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...

2018-11-14 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2920#discussion_r233458367
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -438,13 +438,13 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
 Objects.requireNonNull(path, "path should not be null");
 if (this.writerType == null) {
   throw new IOException(
-  "Writer type is not set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
+  "'writerType' must be set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
   + "API based on input");
 }
 if (this.writtenByApp == null || this.writtenByApp.isEmpty()) {
   throw new RuntimeException(
-  "AppName is not set, please use writtenBy() API to set the App 
Name"
-  + "which is using SDK");
+  "'writtenBy' must be set when writting carbon files, use 
writtenBy() API to "
--- End diff --

writting --> writing


---

[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...

2018-11-14 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2920#discussion_r233458646
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -438,13 +438,13 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
 Objects.requireNonNull(path, "path should not be null");
 if (this.writerType == null) {
   throw new IOException(
-  "Writer type is not set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
+  "'writerType' must be set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
   + "API based on input");
 }
 if (this.writtenByApp == null || this.writtenByApp.isEmpty()) {
   throw new RuntimeException(
-  "AppName is not set, please use writtenBy() API to set the App 
Name"
-  + "which is using SDK");
+  "'writtenBy' must be set when writting carbon files, use 
writtenBy() API to "
+  + "set it, it can be the application name which using the 
SDK");
--- End diff --

it can be the name of the application which uses the SDK.


---

[GitHub] carbondata issue #2909: [CARBONDATA-3089] Change task distribution for NO_SO...

2018-11-14 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2909
  
Besides, can we gain any benefits from this PRï¼


---

[GitHub] carbondata issue #2909: [CARBONDATA-3089] Change task distribution for NO_SO...

2018-11-14 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2909
  
LGTM


---

[GitHub] carbondata issue #2904: [HOTFIX] Remove search mode module

2018-11-13 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2904
  
LGTM


---

[GitHub] carbondata issue #2911: [HOTFIX] change log level for data loading

2018-11-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2911
  
LGTM


---

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1832 matches

Mail list logo