[jira] [Comment Edited] (CARBONDATA-3327) Errors lies in query with small blocklet size
[ https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799944#comment-16799944 ] xuchuanyin edited comment on CARBONDATA-3327 at 3/24/19 8:34 AM: - Besides, I noticed that if we do not filter on the sort_columns, the problem will appear. The content of the diff can also be accessed [here|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10] was (Author: xuchuanyin): Besides, I noticed that if we do not filter on the sort_columns, the problem will appear. The content of the diff can also be accessed here: [diff|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10] > Errors lies in query with small blocklet size > - > > Key: CARBONDATA-3327 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3327 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Priority: Major > > while implementing the following patch > ```diff > diff --git > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > index 69374ad..c6b63a4 100644 > --- > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > +++ > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > @@ -54,7 +54,7 @@ public final class CarbonCommonConstants { >/** > * min blocklet size > */ > - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; > + public static final int BLOCKLET_SIZE_MIN_VAL = 1; > >/** > * max blocklet size > diff --git > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > index df97d0f..ace9fd5 100644 > --- > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > +++ > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > @@ -29,6 +29,7 @@ import > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti > class TestSortColumns extends QueryTest with BeforeAndAfterAll { > >override def beforeAll { > + > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE, > "2") > CarbonProperties.getInstance().addProperty( >CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-") > > ``` > I find that some of the tests in `TestSortColumns` failed with NPE and the > error logs show that > ``` > 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID > 173) > java.lang.NullPointerException > at > org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294) > at > org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; > aborting job > 19/03/23 20:54:30 INFO TestSortColumns: > = FINISHED > org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on > sort_columns include no-dictionary, direct-dictionary and dictioanry' = > 19/03/23 20:54:30 INFO TestSortColumns: > =
[jira] [Comment Edited] (CARBONDATA-3327) Errors lies in query with small blocklet size
[ https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799944#comment-16799944 ] xuchuanyin edited comment on CARBONDATA-3327 at 3/24/19 8:33 AM: - Besides, I noticed that if we do not filter on the sort_columns, the problem will appear. The content of the diff can also be accessed here: [diff|https://gist.github.com/xuchuanyin/e5ffa3cca7c0ad62128fbf8dc1844a10] was (Author: xuchuanyin): Besides, I noticed that if we do not filter on the sort_columns, the problem will appear. > Errors lies in query with small blocklet size > - > > Key: CARBONDATA-3327 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3327 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Priority: Major > > while implementing the following patch > ```diff > diff --git > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > index 69374ad..c6b63a4 100644 > --- > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > +++ > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > @@ -54,7 +54,7 @@ public final class CarbonCommonConstants { >/** > * min blocklet size > */ > - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; > + public static final int BLOCKLET_SIZE_MIN_VAL = 1; > >/** > * max blocklet size > diff --git > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > index df97d0f..ace9fd5 100644 > --- > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > +++ > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > @@ -29,6 +29,7 @@ import > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti > class TestSortColumns extends QueryTest with BeforeAndAfterAll { > >override def beforeAll { > + > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE, > "2") > CarbonProperties.getInstance().addProperty( >CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-") > > ``` > I find that some of the tests in `TestSortColumns` failed with NPE and the > error logs show that > ``` > 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID > 173) > java.lang.NullPointerException > at > org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294) > at > org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; > aborting job > 19/03/23 20:54:30 INFO TestSortColumns: > = FINISHED > org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on > sort_columns include no-dictionary, direct-dictionary and dictioanry' = > 19/03/23 20:54:30 INFO TestSortColumns: > = TEST OUTPUT FOR > org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'unsorted > table creation, query
[jira] [Commented] (CARBONDATA-3327) Errors lies in query with small blocklet size
[ https://issues.apache.org/jira/browse/CARBONDATA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799944#comment-16799944 ] xuchuanyin commented on CARBONDATA-3327: Besides, I noticed that if we do not filter on the sort_columns, the problem will appear. > Errors lies in query with small blocklet size > - > > Key: CARBONDATA-3327 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3327 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Priority: Major > > while implementing the following patch > ```diff > diff --git > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > index 69374ad..c6b63a4 100644 > --- > a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > +++ > b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java > @@ -54,7 +54,7 @@ public final class CarbonCommonConstants { >/** > * min blocklet size > */ > - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; > + public static final int BLOCKLET_SIZE_MIN_VAL = 1; > >/** > * max blocklet size > diff --git > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > index df97d0f..ace9fd5 100644 > --- > a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > +++ > b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala > @@ -29,6 +29,7 @@ import > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti > class TestSortColumns extends QueryTest with BeforeAndAfterAll { > >override def beforeAll { > + > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE, > "2") > CarbonProperties.getInstance().addProperty( >CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-") > > ``` > I find that some of the tests in `TestSortColumns` failed with NPE and the > error logs show that > ``` > 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID > 173) > java.lang.NullPointerException > at > org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294) > at > org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; > aborting job > 19/03/23 20:54:30 INFO TestSortColumns: > = FINISHED > org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on > sort_columns include no-dictionary, direct-dictionary and dictioanry' = > 19/03/23 20:54:30 INFO TestSortColumns: > = TEST OUTPUT FOR > org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'unsorted > table creation, query data loading with heap and safe sort config' = > Job aborted due to stage failure: Task 0 in stage 104.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 104.0 (TID 173, localhost, executor > driver): java.lang.NullPointerException > at >
[jira] [Created] (CARBONDATA-3327) Errors lies in query with small blocklet size
xuchuanyin created CARBONDATA-3327: -- Summary: Errors lies in query with small blocklet size Key: CARBONDATA-3327 URL: https://issues.apache.org/jira/browse/CARBONDATA-3327 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin while implementing the following patch ```diff diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 69374ad..c6b63a4 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -54,7 +54,7 @@ public final class CarbonCommonConstants { /** * min blocklet size */ - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; + public static final int BLOCKLET_SIZE_MIN_VAL = 1; /** * max blocklet size diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala index df97d0f..ace9fd5 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala @@ -29,6 +29,7 @@ import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandExcepti class TestSortColumns extends QueryTest with BeforeAndAfterAll { override def beforeAll { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.BLOCKLET_SIZE, "2") CarbonProperties.getInstance().addProperty( CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "dd-MM-") ``` I find that some of the tests in `TestSortColumns` failed with NPE and the error logs show that ``` 19/03/23 20:54:30 ERROR Executor: Exception in task 0.0 in stage 104.0 (TID 173) java.lang.NullPointerException at org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294) at org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19/03/23 20:54:30 ERROR TaskSetManager: Task 0 in stage 104.0 failed 1 times; aborting job 19/03/23 20:54:30 INFO TestSortColumns: = FINISHED org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'filter on sort_columns include no-dictionary, direct-dictionary and dictioanry' = 19/03/23 20:54:30 INFO TestSortColumns: = TEST OUTPUT FOR org.apache.carbondata.spark.testsuite.sortcolumns.TestSortColumns: 'unsorted table creation, query data loading with heap and safe sort config' = Job aborted due to stage failure: Task 0 in stage 104.0 failed 1 times, most recent failure: Lost task 0.0 in stage 104.0 (TID 173, localhost, executor driver): java.lang.NullPointerException at org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:294) at org.apache.spark.sql.execution.vectorized.ColumnVector.getUTF8String(ColumnVector.java:646) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at
[jira] [Resolved] (CARBONDATA-3281) Limit the LRU cache size
[ https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3281. Resolution: Fixed > Limit the LRU cache size > > > Key: CARBONDATA-3281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3281 > Project: CarbonData > Issue Type: Improvement >Reporter: TaoLi >Priority: Minor > Time Spent: 12h > Remaining Estimate: 0h > > If configure the LRU bigger than jvm xmx size, then use > CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace. > because if setting LRU bigger than xmx size,if we query for a big table with > many more carbonfiles, may cause "Error: java.io.IOException: Problem in > loading segment blocks: GC overhead > limit exceeded (state=,code=0)" the jdbc server will restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-3281) Limit the LRU cache size
[ https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-3281: -- Assignee: (was: xuchuanyin) > Limit the LRU cache size > > > Key: CARBONDATA-3281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3281 > Project: CarbonData > Issue Type: Improvement >Reporter: TaoLi >Priority: Minor > Time Spent: 12h > Remaining Estimate: 0h > > If configure the LRU bigger than jvm xmx size, then use > CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace. > because if setting LRU bigger than xmx size,if we query for a big table with > many more carbonfiles, may cause "Error: java.io.IOException: Problem in > loading segment blocks: GC overhead > limit exceeded (state=,code=0)" the jdbc server will restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-3281) Limit the LRU cache size
[ https://issues.apache.org/jira/browse/CARBONDATA-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-3281: -- Assignee: xuchuanyin > Limit the LRU cache size > > > Key: CARBONDATA-3281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3281 > Project: CarbonData > Issue Type: Improvement >Reporter: TaoLi >Assignee: xuchuanyin >Priority: Minor > Time Spent: 12h > Remaining Estimate: 0h > > If configure the LRU bigger than jvm xmx size, then use > CARBON_MAX_LRU_CACHE_SIZE_DEFAULT replace. > because if setting LRU bigger than xmx size,if we query for a big table with > many more carbonfiles, may cause "Error: java.io.IOException: Problem in > loading segment blocks: GC overhead > limit exceeded (state=,code=0)" the jdbc server will restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2447) Range Partition Table。When the update operation is performed, the data will be lost.
[ https://issues.apache.org/jira/browse/CARBONDATA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2447. Resolution: Fixed Fix Version/s: (was: NONE) > Range Partition Table。When the update operation is performed, the data will > be lost. > > > Key: CARBONDATA-2447 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2447 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.3.1 > Environment: centos6.5 > java8 > Spark2.1.0 > CarbonData1.3.1 >Reporter: duweike >Priority: Blocker > Attachments: 微信图片_20180507113738.jpg, 微信图片_20180507113748.jpg > > Original Estimate: 72h > Time Spent: 7h 10m > Remaining Estimate: 64h 50m > > Range Partition Table。When the update operation is performed, the data will > be lost. > As shown in the picture。 > 如下面图片所示,数据丢失必现。 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3107) Optimize error/exception coding for better debugging
[ https://issues.apache.org/jira/browse/CARBONDATA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3107. Resolution: Fixed > Optimize error/exception coding for better debugging > > > Key: CARBONDATA-3107 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3107 > Project: CarbonData > Issue Type: Improvement >Reporter: jiangmanhua >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3278) Remove duplicate code to get filter string of date/timestamp
[ https://issues.apache.org/jira/browse/CARBONDATA-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3278. Resolution: Fixed > Remove duplicate code to get filter string of date/timestamp > > > Key: CARBONDATA-3278 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3278 > Project: CarbonData > Issue Type: Improvement >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3181) IllegalAccessError for BloomFilter.bits when bloom_compress is false
[ https://issues.apache.org/jira/browse/CARBONDATA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3181. Resolution: Fixed Fix Version/s: 1.5.2 > IllegalAccessError for BloomFilter.bits when bloom_compress is false > > > Key: CARBONDATA-3181 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3181 > Project: CarbonData > Issue Type: Bug >Reporter: jiangmanhua >Priority: Major > Fix For: 1.5.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ``` > 18/12/19 11:16:07 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > java.lang.IllegalAccessError: tried to access field > org.apache.hadoop.util.bloom.BloomFilter.bits from class > org.apache.hadoop.util.bloom.CarbonBloomFilter > at > org.apache.hadoop.util.bloom.CarbonBloomFilter.membershipTest(CarbonBloomFilter.java:70) > at > org.apache.carbondata.datamap.bloom.BloomCoarseGrainDataMap.prune(BloomCoarseGrainDataMap.java:202) > at > org.apache.carbondata.core.datamap.TableDataMap.pruneWithFilter(TableDataMap.java:185) > at > org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:160) > at > org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:517) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:412) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:529) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:220) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:127) > at > org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3166) Changes in Document and Displaying Carbon Column Compressor used in Describe Formatted Command
[ https://issues.apache.org/jira/browse/CARBONDATA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3166. Resolution: Fixed Fix Version/s: 1.5.2 > Changes in Document and Displaying Carbon Column Compressor used in Describe > Formatted Command > -- > > Key: CARBONDATA-3166 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3166 > Project: CarbonData > Issue Type: Improvement >Reporter: Shardul Singh >Assignee: Shardul Singh >Priority: Minor > Fix For: 1.5.2 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Changes in Document and Displaying Carbon Column Compressor used in Describe > Formatted Command -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3139) Fix bugs in datamap example
xuchuanyin created CARBONDATA-3139: -- Summary: Fix bugs in datamap example Key: CARBONDATA-3139 URL: https://issues.apache.org/jira/browse/CARBONDATA-3139 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3133) Update carbondata build document
[ https://issues.apache.org/jira/browse/CARBONDATA-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3133. Resolution: Fixed > Update carbondata build document > > > Key: CARBONDATA-3133 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3133 > Project: CarbonData > Issue Type: Improvement > Components: build >Affects Versions: NONE >Reporter: Jonathan.Wei >Assignee: Jonathan.Wei >Priority: Major > Fix For: 1.5.2 > > Original Estimate: 1h > Time Spent: 2h 40m > Remaining Estimate: 0h > > Update the document to add spark 2.3.2 and add datamap mv compiling method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3031) Find wrong description in the document for 'carbon.number.of.cores.while.loading'
[ https://issues.apache.org/jira/browse/CARBONDATA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3031. Resolution: Fixed Fix Version/s: 1.5.1 > Find wrong description in the document for > 'carbon.number.of.cores.while.loading' > - > > Key: CARBONDATA-3031 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3031 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: lianganping >Priority: Major > Fix For: 1.5.1 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > The document says that the default value of > ‘carbon.number.of.cores.while.loading’ is 2. But actually during data > loading, carbondata use the the value of 'spark.executor.cores', which means > that the description in document is incorrect. > But this doesn't mean that the default value of > 'carbon.number.of.cores.while.loading' is useless -- in compaction and sdk, > carbondata still use this default value. > In a word, we need to fix the implementation as well as the document, maybe > some refactoring is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3087) Prettify DESC FORMATTED output
[ https://issues.apache.org/jira/browse/CARBONDATA-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3087. Resolution: Fixed Assignee: Jacky Li Fix Version/s: 1.5.1 > Prettify DESC FORMATTED output > -- > > Key: CARBONDATA-3087 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3087 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 1.5.1 > > Time Spent: 6h > Remaining Estimate: 0h > > Change output of DESC FORMATTED to: > {noformat} > ++-+---+ > |col_name|data_type > |comment| > ++-+---+ > |shortfield |smallint > |null | > |intfield|int > |null | > |bigintfield |bigint > |null | > |doublefield |double > |null | > |stringfield |string > |null | > |timestampfield |timestamp > |null | > |decimalfield|decimal(18,2) > |null | > |datefield |date > |null | > |charfield |string > |null | > |floatfield |double > |null | > || > | | > |## Table Basic Information | > | | > |Comment | > | | > |Path > |/Users/jacky/code/carbondata/examples/spark2/target/store/default/carbonsession_table| >| > |Table Block Size|1024 MB > | | > |Table Blocklet Size |64 MB > | | > |Streaming |false > | | > |Flat Folder |false > | | > |Bad Record Path | > | | > |Min Input Per Node |0.0B > | | > || > | | > |## Index Information| > | | > |Sort Scope |LOCAL_SORT > | | > |Sort Columns|stringfield,timestampfield,datefield,charfield > | | > |Index Cache Level |BLOCK > | | > |Cached Index Columns|All columns > | | > || > | | > |## Encoding Information | > | | > |Local Dictionary Enabled|true > | | > |Local Dictionary Threshold |1
[jira] [Updated] (CARBONDATA-3088) enhance compaction performance by using prefetch
[ https://issues.apache.org/jira/browse/CARBONDATA-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-3088: --- Issue Type: Improvement (was: Bug) > enhance compaction performance by using prefetch > > > Key: CARBONDATA-3088 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3088 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3088) enhance compaction performance by using prefetch
xuchuanyin created CARBONDATA-3088: -- Summary: enhance compaction performance by using prefetch Key: CARBONDATA-3088 URL: https://issues.apache.org/jira/browse/CARBONDATA-3088 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-3086) Unable build project due maven error
[ https://issues.apache.org/jira/browse/CARBONDATA-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677796#comment-16677796 ] xuchuanyin commented on CARBONDATA-3086: not sure about this error, can you try maven 3.5.0 instead? > Unable build project due maven error > > > Key: CARBONDATA-3086 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3086 > Project: CarbonData > Issue Type: Bug >Reporter: Almaz Murzabekov >Priority: Major > Attachments: packaging.log > > > Hi, guys! > Can you help me, please! > I am trying to build project after cloning from github, with command > {code:java} > mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -X >> > packaging_log.log > {code} > but, I got an error on Core module (No such file or directory, full log see > on attach file) > Environent: > OS: CentOs (Docker image on Windows 10) > Java: OpenJdk 1.8.0_191 > Maven: 3.0.5 (Red Hat 3.0.5-17) > Git: 1.8.3.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-3086) Unable build project due maven error
[ https://issues.apache.org/jira/browse/CARBONDATA-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677798#comment-16677798 ] xuchuanyin commented on CARBONDATA-3086: not sure about this error, can you try maven 3.5.0 instead? > Unable build project due maven error > > > Key: CARBONDATA-3086 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3086 > Project: CarbonData > Issue Type: Bug >Reporter: Almaz Murzabekov >Priority: Major > Attachments: packaging.log > > > Hi, guys! > Can you help me, please! > I am trying to build project after cloning from github, with command > {code:java} > mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package -X >> > packaging_log.log > {code} > but, I got an error on Core module (No such file or directory, full log see > on attach file) > Environent: > OS: CentOs (Docker image on Windows 10) > Java: OpenJdk 1.8.0_191 > Maven: 3.0.5 (Red Hat 3.0.5-17) > Git: 1.8.3.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3078) Exception caused by explain command for count star query without filter
[ https://issues.apache.org/jira/browse/CARBONDATA-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3078. Resolution: Fixed Assignee: jiangmanhua Fix Version/s: 1.5.1 > Exception caused by explain command for count star query without filter > --- > > Key: CARBONDATA-3078 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3078 > Project: CarbonData > Issue Type: Bug >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.5.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Procedure to reproduce the problem: > - create table test_tbl; > - load some data into table; > - run query as "explain select count(*) from test_tbl" > > ``` > Exception in thread "main" java.lang.IllegalStateException > at > org.apache.carbondata.core.profiler.ExplainCollector.getCurrentTablePruningInfo(ExplainCollector.java:162) > at > org.apache.carbondata.core.profiler.ExplainCollector.setShowPruningInfo(ExplainCollector.java:106) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockDataMap.prune(BlockDataMap.java:696) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockDataMap.prune(BlockDataMap.java:743) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getAllBlocklets(BlockletDataMapFactory.java:391) > at > org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:132) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getBlockRowCount(CarbonTableInputFormat.java:618) > at org.apache.spark.sql.CarbonCountStar.doExecute(CarbonCountStar.scala:59) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.command.table.CarbonExplainCommand.collectProfiler(CarbonExplainCommand.scala:54) > at > org.apache.spark.sql.execution.command.table.CarbonExplainCommand.processMetadata(CarbonExplainCommand.scala:45) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:106) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:95) > at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:154) > at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:93) > at org.apache.carbondata.examples.SQL_Prune$.main(Test.scala:101) > at org.apache.carbondata.examples.SQL_Prune.main(Test.scala) > Process finished with exit code 1 > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3074) Change default sort temp compressor to SNAPPY
[ https://issues.apache.org/jira/browse/CARBONDATA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3074. Resolution: Fixed Fix Version/s: 1.5.1 > Change default sort temp compressor to SNAPPY > -- > > Key: CARBONDATA-3074 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3074 > Project: CarbonData > Issue Type: Improvement >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.5.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3069) fix bugs in setting cores for compaction
xuchuanyin created CARBONDATA-3069: -- Summary: fix bugs in setting cores for compaction Key: CARBONDATA-3069 URL: https://issues.apache.org/jira/browse/CARBONDATA-3069 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3067) Add check for debug to avoid string concat
xuchuanyin created CARBONDATA-3067: -- Summary: Add check for debug to avoid string concat Key: CARBONDATA-3067 URL: https://issues.apache.org/jira/browse/CARBONDATA-3067 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin for debug log, we should check before call log method to avoid unnecessary string concatenation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3053) Un-closed file stream found in cli
xuchuanyin created CARBONDATA-3053: -- Summary: Un-closed file stream found in cli Key: CARBONDATA-3053 URL: https://issues.apache.org/jira/browse/CARBONDATA-3053 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3041) Optimize load minimum size strategy for data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3041. Resolution: Fixed > Optimize load minimum size strategy for data loading > > > Key: CARBONDATA-3041 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3041 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Affects Versions: 1.5.0 >Reporter: wangsen >Assignee: wangsen >Priority: Minor > Fix For: 1.5.1 > > Time Spent: 6h > Remaining Estimate: 0h > > 1、Delete system property carbon.load.min.size.enabled,modified this property > load_min_size_inmb to table property,and This property can also be specified > in the load option. > 2、Support to alter table xxx set TBLPROPERTIES('load_min_size_inmb '='256') > 3、If creating a table has this property load_min_size_inmb,Display this > property via the desc formatted command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3051) unclosed streams cause tests failure in windows env
xuchuanyin created CARBONDATA-3051: -- Summary: unclosed streams cause tests failure in windows env Key: CARBONDATA-3051 URL: https://issues.apache.org/jira/browse/CARBONDATA-3051 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3050) Remove unused parameter doc
[ https://issues.apache.org/jira/browse/CARBONDATA-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3050. Resolution: Fixed Fix Version/s: 1.5.1 > Remove unused parameter doc > --- > > Key: CARBONDATA-3050 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3050 > Project: CarbonData > Issue Type: Improvement >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.5.1 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3046) remove outdated configurations in template properties
xuchuanyin created CARBONDATA-3046: -- Summary: remove outdated configurations in template properties Key: CARBONDATA-3046 URL: https://issues.apache.org/jira/browse/CARBONDATA-3046 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3040) Fix bug for merging bloom index
[ https://issues.apache.org/jira/browse/CARBONDATA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3040. Resolution: Fixed Fix Version/s: 1.5.1 > Fix bug for merging bloom index > --- > > Key: CARBONDATA-3040 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3040 > Project: CarbonData > Issue Type: Bug >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.5.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3035) Optimize parameters for unsafe working and sort memory
xuchuanyin created CARBONDATA-3035: -- Summary: Optimize parameters for unsafe working and sort memory Key: CARBONDATA-3035 URL: https://issues.apache.org/jira/browse/CARBONDATA-3035 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3033) Fix errors for parameter description in documents
xuchuanyin created CARBONDATA-3033: -- Summary: Fix errors for parameter description in documents Key: CARBONDATA-3033 URL: https://issues.apache.org/jira/browse/CARBONDATA-3033 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-3031) Find wrong description in the document for 'carbon.number.of.cores.while.loading'
[ https://issues.apache.org/jira/browse/CARBONDATA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-3031: -- Assignee: lianganping (was: xuchuanyin) > Find wrong description in the document for > 'carbon.number.of.cores.while.loading' > - > > Key: CARBONDATA-3031 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3031 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: lianganping >Priority: Major > > The document says that the default value of > ‘carbon.number.of.cores.while.loading’ is 2. But actually during data > loading, carbondata use the the value of 'spark.executor.cores', which means > that the description in document is incorrect. > But this doesn't mean that the default value of > 'carbon.number.of.cores.while.loading' is useless -- in compaction and sdk, > carbondata still use this default value. > In a word, we need to fix the implementation as well as the document, maybe > some refactoring is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3031) Find wrong description in the document for 'carbon.number.of.cores.while.loading'
xuchuanyin created CARBONDATA-3031: -- Summary: Find wrong description in the document for 'carbon.number.of.cores.while.loading' Key: CARBONDATA-3031 URL: https://issues.apache.org/jira/browse/CARBONDATA-3031 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin The document says that the default value of ‘carbon.number.of.cores.while.loading’ is 2. But actually during data loading, carbondata use the the value of 'spark.executor.cores', which means that the description in document is incorrect. But this doesn't mean that the default value of 'carbon.number.of.cores.while.loading' is useless -- in compaction and sdk, carbondata still use this default value. In a word, we need to fix the implementation as well as the document, maybe some refactoring is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3024) Use Log4j directly
[ https://issues.apache.org/jira/browse/CARBONDATA-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-3024: --- Fix Version/s: 1.5.1 > Use Log4j directly > -- > > Key: CARBONDATA-3024 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3024 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 1.5.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently CarbonData's log is printing the line number in StandardLogService, > it is not good for maintainability, a better way is to use log4j Logger > directly so that it will print line number of where we are logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3002) Fix some spell error and remove the data after test case finished running
[ https://issues.apache.org/jira/browse/CARBONDATA-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3002. Resolution: Fixed Fix Version/s: 1.5.1 > Fix some spell error and remove the data after test case finished running > - > > Key: CARBONDATA-3002 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3002 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Fix For: 1.5.1 > > Time Spent: 1h > Remaining Estimate: 0h > > Fix some spell error and remove the data after test case finished running > retrive -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3029) Failed to run spark data source test cases in windows env
xuchuanyin created CARBONDATA-3029: -- Summary: Failed to run spark data source test cases in windows env Key: CARBONDATA-3029 URL: https://issues.apache.org/jira/browse/CARBONDATA-3029 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3028) failed query spark file format table when there are blanks in long_string_columns
xuchuanyin created CARBONDATA-3028: -- Summary: failed query spark file format table when there are blanks in long_string_columns Key: CARBONDATA-3028 URL: https://issues.apache.org/jira/browse/CARBONDATA-3028 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3026) clear expired property that may cause GC problem
xuchuanyin created CARBONDATA-3026: -- Summary: clear expired property that may cause GC problem Key: CARBONDATA-3026 URL: https://issues.apache.org/jira/browse/CARBONDATA-3026 Project: CarbonData Issue Type: Bug Components: data-load Reporter: xuchuanyin Assignee: xuchuanyin During data loading, we will write some temp files (sort temp files and temp fact data files) in some locations. In currently implementation, we will add the locations to the CarbonProperties and associated it with a special key that refers to the data loading. After data loading, the temp locations are cleared, but the added property is still remain in the CarbonProperties and never to be cleared. This will cause the CarbonProperties object growing bigger and bigger and lead to OOM problems if the thrift-server is a long time running service. A local test shows that after adding different properties for 11 Billion times, the OOM happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3024) Use Log4j directly
[ https://issues.apache.org/jira/browse/CARBONDATA-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-3024. Resolution: Fixed merged into 1.5.1 > Use Log4j directly > -- > > Key: CARBONDATA-3024 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3024 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Currently CarbonData's log is printing the line number in StandardLogService, > it is not good for maintainability, a better way is to use log4j Logger > directly so that it will print line number of where we are logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3009) Optimize the entry point of code for MergeIndex
xuchuanyin created CARBONDATA-3009: -- Summary: Optimize the entry point of code for MergeIndex Key: CARBONDATA-3009 URL: https://issues.apache.org/jira/browse/CARBONDATA-3009 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3008) make yarn-local and multiple dir for temp data enable by default
[ https://issues.apache.org/jira/browse/CARBONDATA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-3008: --- Priority: Minor (was: Major) > make yarn-local and multiple dir for temp data enable by default > > > Key: CARBONDATA-3008 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3008 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Priority: Minor > > About a year ago, we introduced 'multiple dirs for temp data during data > loading' to solve disk hotspot problem. For about one years' usage in > productive environment, this feature turns to be effective and correct. So > here I propose to enable the related parameters by default. The related > parameters contains: > `carbon.use.local.dir` : Currently it is `false` by default, we will turn it > to `true` by default; > `carbon.user.multiple.dir` : Currently it is `false` by default, we will turn > it to `true` by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3008) make yarn-local and multiple dir for temp data enable by default
xuchuanyin created CARBONDATA-3008: -- Summary: make yarn-local and multiple dir for temp data enable by default Key: CARBONDATA-3008 URL: https://issues.apache.org/jira/browse/CARBONDATA-3008 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin About a year ago, we introduced 'multiple dirs for temp data during data loading' to solve disk hotspot problem. For about one years' usage in productive environment, this feature turns to be effective and correct. So here I propose to enable the related parameters by default. The related parameters contains: `carbon.use.local.dir` : Currently it is `false` by default, we will turn it to `true` by default; `carbon.user.multiple.dir` : Currently it is `false` by default, we will turn it to `true` by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-3007) Fix error in document
[ https://issues.apache.org/jira/browse/CARBONDATA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-3007: -- Assignee: xuchuanyin > Fix error in document > - > > Key: CARBONDATA-3007 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3007 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3007) Fix error in document
xuchuanyin created CARBONDATA-3007: -- Summary: Fix error in document Key: CARBONDATA-3007 URL: https://issues.apache.org/jira/browse/CARBONDATA-3007 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2988) use unsafe for query model based on system property
[ https://issues.apache.org/jira/browse/CARBONDATA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2988. -- Resolution: Not A Problem > use unsafe for query model based on system property > --- > > Key: CARBONDATA-2988 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2988 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3004) Fix bug in writing dataframe to carbon table while the field order is different
[ https://issues.apache.org/jira/browse/CARBONDATA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-3004: --- Issue Type: Sub-task (was: Bug) Parent: CARBONDATA-2420 > Fix bug in writing dataframe to carbon table while the field order is > different > --- > > Key: CARBONDATA-3004 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3004 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > More information about this issue can be found in this link: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Issue-Long-string-columns-config-for-big-strings-not-work-td64876.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3004) Fix bug in writing dataframe to carbon table while the field order is different
xuchuanyin created CARBONDATA-3004: -- Summary: Fix bug in writing dataframe to carbon table while the field order is different Key: CARBONDATA-3004 URL: https://issues.apache.org/jira/browse/CARBONDATA-3004 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin More information about this issue can be found in this link: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Issue-Long-string-columns-config-for-big-strings-not-work-td64876.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2988) use unsafe for query model based on system property
xuchuanyin created CARBONDATA-2988: -- Summary: use unsafe for query model based on system property Key: CARBONDATA-2988 URL: https://issues.apache.org/jira/browse/CARBONDATA-2988 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2974) Bloomfilter not working when created bloom on multiple columns and queried
[ https://issues.apache.org/jira/browse/CARBONDATA-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2974. Resolution: Fixed Fix Version/s: 1.5.0 > Bloomfilter not working when created bloom on multiple columns and queried > -- > > Key: CARBONDATA-2974 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2974 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Priority: Major > Fix For: 1.5.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Please check the link for more information > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Issue-Bloomfilter-datamap-td63254.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2980) clear bloomindex cache when dropping datamap
xuchuanyin created CARBONDATA-2980: -- Summary: clear bloomindex cache when dropping datamap Key: CARBONDATA-2980 URL: https://issues.apache.org/jira/browse/CARBONDATA-2980 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin should clear the bloomindex cache when we drop datamap, otherwise query will fail if we drop and recreate a brand new table and datamap and the stale cache still exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2971) Add shard info of blocklet for debugging
[ https://issues.apache.org/jira/browse/CARBONDATA-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2971. Resolution: Fixed > Add shard info of blocklet for debugging > > > Key: CARBONDATA-2971 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2971 > Project: CarbonData > Issue Type: Improvement >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2965) Support scan performance benchmark tool
[ https://issues.apache.org/jira/browse/CARBONDATA-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2965. Resolution: Fixed Fix Version/s: 1.5.0 > Support scan performance benchmark tool > --- > > Key: CARBONDATA-2965 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2965 > Project: CarbonData > Issue Type: New Feature >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 1.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2957) update document about zstd support in carbondata
xuchuanyin created CARBONDATA-2957: -- Summary: update document about zstd support in carbondata Key: CARBONDATA-2957 URL: https://issues.apache.org/jira/browse/CARBONDATA-2957 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2955) bug for legacy store and compaction with zstd compressor and adaptiveDeltaIntegralCodec
[ https://issues.apache.org/jira/browse/CARBONDATA-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-2955: -- Assignee: xuchuanyin Description: if table is configured with zstd compressor, compaction will fail if we use adaptiveDeltaIntegralCodec; Summary: bug for legacy store and compaction with zstd compressor and adaptiveDeltaIntegralCodec (was: bug for legacy store andwith zstd compressor) > bug for legacy store and compaction with zstd compressor and > adaptiveDeltaIntegralCodec > --- > > Key: CARBONDATA-2955 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2955 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > if table is configured with zstd compressor, compaction will fail if we use > adaptiveDeltaIntegralCodec; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2955) bug for legacy store andwith zstd compressor
xuchuanyin created CARBONDATA-2955: -- Summary: bug for legacy store andwith zstd compressor Key: CARBONDATA-2955 URL: https://issues.apache.org/jira/browse/CARBONDATA-2955 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2944) optimize compress/decompress related procedure
xuchuanyin created CARBONDATA-2944: -- Summary: optimize compress/decompress related procedure Key: CARBONDATA-2944 URL: https://issues.apache.org/jira/browse/CARBONDATA-2944 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin While implementing customize compressor, I found that carbon compressor deal with primitive object while compressing/decompressing which I think will cause efficiency because: 1. many compressor do not provide compress/decompress interface for primitive objects, we need to handle it ourselves, which may cause unnecessary conversion from primitive to bytes and from bytes to primitives; 2. for querying, we need to decompress the content. I think it's better to keep them in bytes and convert them to primitives until it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2933) Fix errors in spelling
xuchuanyin created CARBONDATA-2933: -- Summary: Fix errors in spelling Key: CARBONDATA-2933 URL: https://issues.apache.org/jira/browse/CARBONDATA-2933 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2930) Support customize column compressor
xuchuanyin created CARBONDATA-2930: -- Summary: Support customize column compressor Key: CARBONDATA-2930 URL: https://issues.apache.org/jira/browse/CARBONDATA-2930 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin Support customize compressor to compress the final store. User can create their own compressor and specify it during creating table or loading data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2850) Support configurable column compressor for final store
[ https://issues.apache.org/jira/browse/CARBONDATA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2850: --- Summary: Support configurable column compressor for final store (was: Support zstd as column compressor in final store) > Support configurable column compressor for final store > -- > > Key: CARBONDATA-2850 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2850 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Attachments: Tests on Zstd as column compressor.pdf > > > ZSTD has a better compression ratio that snappy and the compress/decompress > rate is acceptable compared with snappy. > After we introduce zstd as the column compressor, the size of carbondata > final store will be reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2852) support zstd on legacy store
[ https://issues.apache.org/jira/browse/CARBONDATA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2852. Resolution: Fixed Fix Version/s: 1.5.0 > support zstd on legacy store > > > Key: CARBONDATA-2852 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2852 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.5.0 > > > Currently carbondata reads the column compressor from system property. This > will cause problems on legacy store if we have changed the compressor. > It should read that information from metadata in data files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2850) Support zstd as column compressor in final store
[ https://issues.apache.org/jira/browse/CARBONDATA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2850: --- Attachment: Tests on Zstd as column compressor.pdf > Support zstd as column compressor in final store > > > Key: CARBONDATA-2850 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2850 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Attachments: Tests on Zstd as column compressor.pdf > > > ZSTD has a better compression ratio that snappy and the compress/decompress > rate is acceptable compared with snappy. > After we introduce zstd as the column compressor, the size of carbondata > final store will be reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2904) Support minmax datamap for external format
xuchuanyin created CARBONDATA-2904: -- Summary: Support minmax datamap for external format Key: CARBONDATA-2904 URL: https://issues.apache.org/jira/browse/CARBONDATA-2904 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2890) Use CarbonLoadModelBuilder instead of new CarbonLoadModel instance
xuchuanyin created CARBONDATA-2890: -- Summary: Use CarbonLoadModelBuilder instead of new CarbonLoadModel instance Key: CARBONDATA-2890 URL: https://issues.apache.org/jira/browse/CARBONDATA-2890 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Currently to get an instance of CarbonLoadModel, we can: 1. directly new an instance and set the member one by one; 2. or use the CarbonLoadModelBuilder to build an instance However some of the members of CarbonLoadModel (such as ColumnCompressor, tableName) are required in the following procedure. For the 1st method, these members may be forgotten to initialize. While for the 2nd method, we can validate these members in the build method to ensure that these members are initialized. So here I proposed to only use the CarbonLoadModelBuilder to instantiate a CarbonLoadModel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (CARBONDATA-2420) Support string longer than 32000 characters
[ https://issues.apache.org/jira/browse/CARBONDATA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reopened CARBONDATA-2420: Assignee: (was: xuchuanyin) > Support string longer than 32000 characters > --- > > Key: CARBONDATA-2420 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2420 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Priority: Major > Fix For: 1.4.1 > > Time Spent: 19h 40m > Remaining Estimate: 0h > > Add a property in creating table 'long_string_columns' to support string > columns that will contains more than 32000 characters. > Inside carbondata, it use an integer instead of short to store the length of > bytes content. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2881) Tests in TestStreamingOperation should be independent but actually have interrelationship
xuchuanyin created CARBONDATA-2881: -- Summary: Tests in TestStreamingOperation should be independent but actually have interrelationship Key: CARBONDATA-2881 URL: https://issues.apache.org/jira/browse/CARBONDATA-2881 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin all the testcases in TestStreamingOperation use the exactly same table and do data loading. Once one test fail, the following testcases will fail too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2878) Umbrella issue for minor modifications
xuchuanyin created CARBONDATA-2878: -- Summary: Umbrella issue for minor modifications Key: CARBONDATA-2878 URL: https://issues.apache.org/jira/browse/CARBONDATA-2878 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin This umbrella issue is to cover the minor issues about the defect in bugs and optimizations for doc, tests and code. The sub-issues are simple and labeled as 'newbie' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2873) Support query through index datamap for external CSV format
xuchuanyin created CARBONDATA-2873: -- Summary: Support query through index datamap for external CSV format Key: CARBONDATA-2873 URL: https://issues.apache.org/jira/browse/CARBONDATA-2873 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2562) Support create and build index datamaps on external CSV format
[ https://issues.apache.org/jira/browse/CARBONDATA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2562: --- Summary: Support create and build index datamaps on external CSV format (was: Support index datamaps on external CSV format) > Support create and build index datamaps on external CSV format > -- > > Key: CARBONDATA-2562 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2562 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Support creating indexed datamap on external CSV datasource. > Support rebuilding the indexed datamap for the external CSV datasource. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2562) Support index datamaps on external CSV format
[ https://issues.apache.org/jira/browse/CARBONDATA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2562: --- Description: Support creating indexed datamap on external CSV datasource. Support rebuilding the indexed datamap for the external CSV datasource. was: Support creating indexed datamap on external CSV datasource. Support rebuilding the indexed datamap for the external CSV datasource. Query on external datasource make use of datamap if it is available. > Support index datamaps on external CSV format > - > > Key: CARBONDATA-2562 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2562 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Support creating indexed datamap on external CSV datasource. > Support rebuilding the indexed datamap for the external CSV datasource. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2562) Support index datamaps on external CSV format
[ https://issues.apache.org/jira/browse/CARBONDATA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2562: --- Summary: Support index datamaps on external CSV format (was: Support datamaps on external CSV format) > Support index datamaps on external CSV format > - > > Key: CARBONDATA-2562 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2562 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Support creating indexed datamap on external CSV datasource. > Support rebuilding the indexed datamap for the external CSV datasource. > Query on external datasource make use of datamap if it is available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2768) Fix error test for external format
[ https://issues.apache.org/jira/browse/CARBONDATA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2768: --- Issue Type: Sub-task (was: Bug) Parent: CARBONDATA-2561 > Fix error test for external format > -- > > Key: CARBONDATA-2768 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2768 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2859) add sdv test case for bloomfilter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2859: --- Issue Type: Sub-task (was: Bug) Parent: CARBONDATA-2632 > add sdv test case for bloomfilter datamap > - > > Key: CARBONDATA-2859 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2859 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > add sdv test case for bloomfilter datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2859) add sdv test case for bloomfilter datamap
xuchuanyin created CARBONDATA-2859: -- Summary: add sdv test case for bloomfilter datamap Key: CARBONDATA-2859 URL: https://issues.apache.org/jira/browse/CARBONDATA-2859 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin add sdv test case for bloomfilter datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2856) Fix bug in bloom index on multiple dictionary columns
xuchuanyin created CARBONDATA-2856: -- Summary: Fix bug in bloom index on multiple dictionary columns Key: CARBONDATA-2856 URL: https://issues.apache.org/jira/browse/CARBONDATA-2856 Project: CarbonData Issue Type: Sub-task Affects Versions: 1.4.1 Reporter: xuchuanyin Assignee: xuchuanyin Fix For: 1.4.2 create bloom index on a table which has date and string columns and the string columns is global dictionary. The data loading procedure will fail -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2852) support zstd on legacy store
xuchuanyin created CARBONDATA-2852: -- Summary: support zstd on legacy store Key: CARBONDATA-2852 URL: https://issues.apache.org/jira/browse/CARBONDATA-2852 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin Currently carbondata reads the column compressor from system property. This will cause problems on legacy store if we have changed the compressor. It should read that information from metadata in data files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2850) Support zstd as column compressor in final store
[ https://issues.apache.org/jira/browse/CARBONDATA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2850: --- Description: ZSTD has a better compression ratio that snappy and the compress/decompress rate is acceptable compared with snappy. After we introduce zstd as the column compressor, the size of carbondata final store will be reduced. > Support zstd as column compressor in final store > > > Key: CARBONDATA-2850 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2850 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > ZSTD has a better compression ratio that snappy and the compress/decompress > rate is acceptable compared with snappy. > After we introduce zstd as the column compressor, the size of carbondata > final store will be reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2851) support zstd as column compressor
xuchuanyin created CARBONDATA-2851: -- Summary: support zstd as column compressor Key: CARBONDATA-2851 URL: https://issues.apache.org/jira/browse/CARBONDATA-2851 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2850) Support zstd as column compressor in final store
xuchuanyin created CARBONDATA-2850: -- Summary: Support zstd as column compressor in final store Key: CARBONDATA-2850 URL: https://issues.apache.org/jira/browse/CARBONDATA-2850 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2835) Block MV datamap on streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2835: --- Issue Type: Sub-task (was: Bug) Parent: CARBONDATA-2628 > Block MV datamap on streaming table > --- > > Key: CARBONDATA-2835 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2835 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: wangsen >Priority: Major > > We should block creating MV datamap on streaming table; > Also we should block setting streaming property for table which has MV > datamap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2835) Block MV datamap on streaming table
xuchuanyin created CARBONDATA-2835: -- Summary: Block MV datamap on streaming table Key: CARBONDATA-2835 URL: https://issues.apache.org/jira/browse/CARBONDATA-2835 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: wangsen We should block creating MV datamap on streaming table; Also we should block setting streaming property for table which has MV datamap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2809) Manually rebuilding non-lazy datamap cause error
[ https://issues.apache.org/jira/browse/CARBONDATA-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2809. -- Resolution: Duplicate duplicated with CARBONDATA-2821 > Manually rebuilding non-lazy datamap cause error > > > Key: CARBONDATA-2809 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2809 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > Steps to reproduce: > 1. create base table > 2. load data to base table > 3. create index datamap (such as bloomfilter datamap) on base table > 4. rebuild datamap This will give error > In step3, the data of datamap has already been generated, if we trigger > rebuild, the procedure does not clean the files properly, thus causing the > error. > Actually, the rebuild is not required. We can fix this issue by skipping the > rebuild procedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reopened CARBONDATA-2820: > Block rebuilding for preagg, bloom and lucene datamap > - > > Key: CARBONDATA-2820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2820 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > currently we will block rebuilding these datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2820. -- Resolution: Duplicate > Block rebuilding for preagg, bloom and lucene datamap > - > > Key: CARBONDATA-2820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2820 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > currently we will block rebuilding these datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571049#comment-16571049 ] xuchuanyin edited comment on CARBONDATA-2820 at 8/7/18 2:40 AM: duplicated with CARBONDATA-2821 was (Author: xuchuanyin): duplicated with CARBONDATA-2823 > Block rebuilding for preagg, bloom and lucene datamap > - > > Key: CARBONDATA-2820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2820 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > currently we will block rebuilding these datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2820. -- Resolution: Duplicate duplicated with CARBONDATA-2823 > Block rebuilding for preagg, bloom and lucene datamap > - > > Key: CARBONDATA-2820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2820 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > currently we will block rebuilding these datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2833) NPE when we do a insert over a insert failure operation
[ https://issues.apache.org/jira/browse/CARBONDATA-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571028#comment-16571028 ] xuchuanyin commented on CARBONDATA-2833: steps in issue description cannot reproduce the problem, I've tried with another steps, but still cannot reproduce it: ``` test("test") { CarbonProperties.getInstance().addProperty("bad_records_logger_enable", "true") CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FAIL") sql("CREATE DATABASE test1") sql("use test1") sql("DROP TABLE IF EXISTS ab") sql("CREATE TABLE ab (a integer, b string) stored by 'carbondata'") sql("CREATE DATAMAP dm ON TABLE ab using 'bloomfilter' DMPROPERTIES('index_columns'='a,b')") try { sql("insert into ab select 'berb', 'abc', 'ggg', '1'") } catch { case e : Exception => LOGGER.error(e) } LOGGER.error("XU second run") try { sql("insert into ab select 'berb', 'abc', 'ggg', '1'") } catch { case e : Exception => LOGGER.error(e) } sql("select * from ab").show(false) sql("DROP TABLE IF EXISTS ab") sql("DROP DATABASE IF EXISTS test1") sql("use default") CarbonProperties.getInstance().addProperty("bad_records_logger_enable", CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE_DEFAULT) CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FAIL") } ``` The load statement complains about the bad_record error, no NPE is reported. > NPE when we do a insert over a insert failure operation > --- > > Key: CARBONDATA-2833 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2833 > Project: CarbonData > Issue Type: Bug >Reporter: Brijoo Bopanna >Priority: Major > > jdbc:hive2://10.18.5.188:23040/default> CREATE TABLE > 0: jdbc:hive2://10.18.5.188:23040/default> IF NOT EXISTS test_table( > 0: jdbc:hive2://10.18.5.188:23040/default> id string, > 0: jdbc:hive2://10.18.5.188:23040/default> name string, > 0: jdbc:hive2://10.18.5.188:23040/default> city string, > 0: jdbc:hive2://10.18.5.188:23040/default> age Int) > 0: jdbc:hive2://10.18.5.188:23040/default> STORED BY 'carbondata'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.191 seconds) > 0: jdbc:hive2://10.18.5.188:23040/default> > 0: jdbc:hive2://10.18.5.188:23040/default> > 0: jdbc:hive2://10.18.5.188:23040/default> > 0: jdbc:hive2://10.18.5.188:23040/default> desc test_table > 0: jdbc:hive2://10.18.5.188:23040/default> ; > +---++--+--+ > | col_name | data_type | comment | > +---++--+--+ > | id | string | NULL | > | name | string | NULL | > | city | string | NULL | > | age | int | NULL | > +---++--+--+ > 4 rows selected (0.081 seconds) > 0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select > 'berb','abc','ggg','1'; > Error: java.lang.Exception: Data load failed due to bad record: The value > with column name a and column data type INT is not a valid INT type.Please > enable bad record logger to know the detail reason. (state=,code=0) > 0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select > 'berb','abc','ggg','1'; > *Error: java.lang.NullPointerException (state=,code=0)* > 0: jdbc:hive2://10.18.5.188:23040/default> insert into test_table select > 'berb','abc','ggg',1; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.127 seconds) > 0: jdbc:hive2://10.18.5.188:23040/default> show tables > 0: jdbc:hive2://10.18.5.188:23040/default> ; > +---+-+--+--+ > | database | tableName | isTemporary | > +---+-+--+--+ > | praveen | a | false | > | praveen | ab | false | > | praveen | bbc | false | > | praveen | test_table | false | > +---+-+--+--+ > 4 rows selected (0.041 seconds) > 0: jdbc:hive2://10.18.5.188:23040/default> > 0: jdbc:hive2://10.18.5.188:23040/default> desc ab > 0: jdbc:hive2://10.18.5.188:23040/default> ; > +---++--+--+ > | col_name | data_type | comment | > +---++--+--+ > | a | int | NULL | > | b | string | NULL | > +---++--+--+ > 2 rows selected (0.074 seconds) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2763) Create table with partition and no_inverted_index on long_string column is not blocked
[ https://issues.apache.org/jira/browse/CARBONDATA-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2763. Resolution: Fixed Fix Version/s: 1.4.1 > Create table with partition and no_inverted_index on long_string column is > not blocked > -- > > Key: CARBONDATA-2763 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2763 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 > Environment: Spark 2.1, 2.2 >Reporter: Chetan Bhat >Priority: Minor > Fix For: 1.4.1 > > > Steps : > # Create table with partition using long_string column > CREATE TABLE local_no_inverted_index(id int, name string, description > string,address string, note string) STORED BY 'org.apache.carbondata.format' > tblproperties('no_inverted_index'='note','long_string_columns'='note'); > 2. Create table with no_inverted_index > CREATE TABLE local1_partition(id int,name string, description > string,address string) partitioned by (note string) STORED BY > 'org.apache.carbondata.format' tblproperties('long_string_columns'='note'); > > Actual Output : The Create table with partition and no_inverted_index on > long_string column is successful. > 0: jdbc:hive2://10.18.98.101:22550/default> CREATE TABLE > local_no_inverted_index(id int, name string, description string,address > string, note string) STORED BY 'org.apache.carbondata.format' > tblproperties('no_inverted_index'='note','long_string_columns'='note'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (2.604 seconds) > 0: jdbc:hive2://10.18.98.101:22550/default> CREATE TABLE local1_partition(id > int,name string, description string,address string) partitioned by (note > string) STORED BY 'org.apache.carbondata.format' > tblproperties('long_string_columns'='note'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.989 seconds) > Expected Output - The Create table with partition and no_inverted_index on > long_string column should be blocked. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2762) Long string column displayed as string in describe formatted
[ https://issues.apache.org/jira/browse/CARBONDATA-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2762. Resolution: Fixed Fix Version/s: 1.4.1 > Long string column displayed as string in describe formatted > > > Key: CARBONDATA-2762 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2762 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 >Reporter: Chetan Bhat >Priority: Minor > Fix For: 1.4.1 > > > Steps : > User creates a table with long string column and executes the describe > formatted table command. > 0: jdbc:hive2://10.18.98.101:22550/default> create table t2(c1 string, c2 > string) stored by 'carbondata' tblproperties('long_string_columns' = 'c2'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (3.034 seconds) > 0: jdbc:hive2://10.18.98.101:22550/default> desc formatted t2; > Actual Output : The describe formatted displays the c2 column as string > instead of long string. > 0: jdbc:hive2://10.18.98.101:22550/default> desc formatted t2; > +---+---+---+--+ > | col_name | data_type | comment | > +---+---+---+--+ > | c1 | string | KEY COLUMN,null | > *| c2 | string | KEY COLUMN,null |* > | | | | > | ##Detailed Table Information | | | > | Database Name | default | | > | Table Name | t2 | | > | CARBON Store Path | > hdfs://hacluster/user/hive/warehouse/carbon.store/default/t2 | | > | Comment | | | > | Table Block Size | 1024 MB | | > | Table Data Size | 0 | | > | Table Index Size | 0 | | > | Last Update Time | 0 | | > | SORT_SCOPE | LOCAL_SORT | LOCAL_SORT | > | CACHE_LEVEL | BLOCK | | > | Streaming | false | | > | Local Dictionary Enabled | true | | > | Local Dictionary Threshold | 1 | | > | Local Dictionary Include | c1,c2 | | > | | | | > | ##Detailed Column property | | | > | ADAPTIVE | | | > | SORT_COLUMNS | c1 | | > +---+---+---+--+ > 22 rows selected (2.847 seconds) > > Expected Output : The describe formatted should display the c2 column as long > string. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2796) Fix data loading problem when table has complex column and long string column
[ https://issues.apache.org/jira/browse/CARBONDATA-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2796. Resolution: Fixed Fix Version/s: 1.4.1 > Fix data loading problem when table has complex column and long string column > -- > > Key: CARBONDATA-2796 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2796 > Project: CarbonData > Issue Type: Sub-task >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.4.1 > > Time Spent: 3h > Remaining Estimate: 0h > > currently both varchar column and complex column believes itself is the last > one member in noDictionary group when converting carbon row from raw format > to 3-parted format. Since they need to be proceeded in different way, > exception will occur if we deal the column in wrong way. > To fix this, we marked the info of complex columns explicitly like varchar > columns, and keep the order of noDictionary group as : normal Dim & varchar & > complex -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2823) Alter table set local dictionary include after bloom creation and merge index on old V3 store fails throwing incorrect error
[ https://issues.apache.org/jira/browse/CARBONDATA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569825#comment-16569825 ] xuchuanyin commented on CARBONDATA-2823: since we get the splits from streaming segment and columnar segments respectively, we can support streaming with index datamap > Alter table set local dictionary include after bloom creation and merge index > on old V3 store fails throwing incorrect error > > > Key: CARBONDATA-2823 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2823 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Assignee: xuchuanyin >Priority: Minor > > Steps : > In old version V3 store create table and load data. > CREATE TABLE uniqdata_load (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata_load OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > In 1.4.1 version refresh the table of old V3 store. > refresh table uniqdata_load; > Create bloom filter and merge index. > CREATE DATAMAP dm_uniqdata1_tmstmp ON TABLE uniqdata_load USING 'bloomfilter' > DMPROPERTIES ('INDEX_COLUMNS' = 'DOJ', 'BLOOM_SIZE'='64', > 'BLOOM_FPP'='0.1'); > Alter table set local dictionary include. > alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > > Issue : Alter table set local dictionary include fails with incorrect error. > 0: jdbc:hive2://10.18.98.101:22550/default> alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > *Error: > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > streaming is not supported for index datamap (state=,code=0)* > > Expected : Operation should be success. If the operation is unsupported it > should throw correct error message. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CARBONDATA-2823) Alter table set local dictionary include after bloom creation and merge index on old V3 store fails throwing incorrect error
[ https://issues.apache.org/jira/browse/CARBONDATA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569823#comment-16569823 ] xuchuanyin edited comment on CARBONDATA-2823 at 8/6/18 7:22 AM: As for CARBONDATA-2823, it can simply be reproduced by 1. create table 2. create bloom/lucene datamap 3. load data 4. alter table set tblProperties was (Author: xuchuanyin): As for CARBONDATA-2823, it can simply reproduced by 1. create table 2. create bloom/lucene datamap 3. load data 4. alter table set tblProperties > Alter table set local dictionary include after bloom creation and merge index > on old V3 store fails throwing incorrect error > > > Key: CARBONDATA-2823 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2823 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Priority: Minor > > Steps : > In old version V3 store create table and load data. > CREATE TABLE uniqdata_load (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata_load OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > In 1.4.1 version refresh the table of old V3 store. > refresh table uniqdata_load; > Create bloom filter and merge index. > CREATE DATAMAP dm_uniqdata1_tmstmp ON TABLE uniqdata_load USING 'bloomfilter' > DMPROPERTIES ('INDEX_COLUMNS' = 'DOJ', 'BLOOM_SIZE'='64', > 'BLOOM_FPP'='0.1'); > Alter table set local dictionary include. > alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > > Issue : Alter table set local dictionary include fails with incorrect error. > 0: jdbc:hive2://10.18.98.101:22550/default> alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > *Error: > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > streaming is not supported for index datamap (state=,code=0)* > > Expected : Operation should be success. If the operation is unsupported it > should throw correct error message. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2823) Alter table set local dictionary include after bloom creation and merge index on old V3 store fails throwing incorrect error
[ https://issues.apache.org/jira/browse/CARBONDATA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-2823: -- Assignee: xuchuanyin > Alter table set local dictionary include after bloom creation and merge index > on old V3 store fails throwing incorrect error > > > Key: CARBONDATA-2823 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2823 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Assignee: xuchuanyin >Priority: Minor > > Steps : > In old version V3 store create table and load data. > CREATE TABLE uniqdata_load (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata_load OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > In 1.4.1 version refresh the table of old V3 store. > refresh table uniqdata_load; > Create bloom filter and merge index. > CREATE DATAMAP dm_uniqdata1_tmstmp ON TABLE uniqdata_load USING 'bloomfilter' > DMPROPERTIES ('INDEX_COLUMNS' = 'DOJ', 'BLOOM_SIZE'='64', > 'BLOOM_FPP'='0.1'); > Alter table set local dictionary include. > alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > > Issue : Alter table set local dictionary include fails with incorrect error. > 0: jdbc:hive2://10.18.98.101:22550/default> alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > *Error: > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > streaming is not supported for index datamap (state=,code=0)* > > Expected : Operation should be success. If the operation is unsupported it > should throw correct error message. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2823) Alter table set local dictionary include after bloom creation and merge index on old V3 store fails throwing incorrect error
[ https://issues.apache.org/jira/browse/CARBONDATA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569823#comment-16569823 ] xuchuanyin commented on CARBONDATA-2823: As for CARBONDATA-2823, it can simply reproduced by 1. create table 2. create bloom/lucene datamap 3. load data 4. alter table set tblProperties > Alter table set local dictionary include after bloom creation and merge index > on old V3 store fails throwing incorrect error > > > Key: CARBONDATA-2823 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2823 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Priority: Minor > > Steps : > In old version V3 store create table and load data. > CREATE TABLE uniqdata_load (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata_load OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > In 1.4.1 version refresh the table of old V3 store. > refresh table uniqdata_load; > Create bloom filter and merge index. > CREATE DATAMAP dm_uniqdata1_tmstmp ON TABLE uniqdata_load USING 'bloomfilter' > DMPROPERTIES ('INDEX_COLUMNS' = 'DOJ', 'BLOOM_SIZE'='64', > 'BLOOM_FPP'='0.1'); > Alter table set local dictionary include. > alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > > Issue : Alter table set local dictionary include fails with incorrect error. > 0: jdbc:hive2://10.18.98.101:22550/default> alter table uniqdata_load set > tblproperties('local_dictionary_include'='CUST_NAME'); > *Error: > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > streaming is not supported for index datamap (state=,code=0)* > > Expected : Operation should be success. If the operation is unsupported it > should throw correct error message. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2420) Support string longer than 32000 characters
[ https://issues.apache.org/jira/browse/CARBONDATA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2420. -- Resolution: Resolved Fix Version/s: 1.4.1 1.4.1 introduced 32k feature (alpha) to support this > Support string longer than 32000 characters > --- > > Key: CARBONDATA-2420 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2420 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.1 > > Time Spent: 19h 40m > Remaining Estimate: 0h > > Add a property in creating table 'long_string_columns' to support string > columns that will contains more than 32000 characters. > Inside carbondata, it use an integer instead of short to store the length of > bytes content. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2340) load数据超过32000byte
[ https://issues.apache.org/jira/browse/CARBONDATA-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2340. -- Resolution: Fixed Fix Version/s: 1.4.1 1.4.1 introduced 32k feature (alpha) to support this > load数据超过32000byte > - > > Key: CARBONDATA-2340 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2340 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 >Reporter: niaoshu >Assignee: xuchuanyin >Priority: Blocker > Fix For: 1.4.1 > > Original Estimate: 12h > Remaining Estimate: 12h > > INFO storage.BlockManagerMasterEndpoint: Registering block manager > spark1:12603 with 5.2 GB RAM, BlockManagerId(1, spark1, 12603, None) > 18/04/11 14:24:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on spark1:12603 (size: 34.9 KB, free: 5.2 GB) > 18/04/11 14:24:34 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark1, executor 1): > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > Dataload failed, String size cannot exceed 32000 bytes > at > org.apache.carbondata.processing.loading.converter.impl.NonDictionaryFieldConverterImpl.convert(NonDictionaryFieldConverterImpl.java:75) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:162) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl.processRowBatch(DataConverterProcessorStepImpl.java:104) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:91) > at > org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:77) > at > org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2339) 下标越界
[ https://issues.apache.org/jira/browse/CARBONDATA-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2339. -- Resolution: Fixed Fix Version/s: (was: NONE) 1.4.1 1.4.1 introduced 32k feature (alpha) to support this > 下标越界 > > > Key: CARBONDATA-2339 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2339 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 >Reporter: niaoshu >Assignee: xuchuanyin >Priority: Blocker > Fix For: 1.4.1 > > Original Estimate: 96h > Remaining Estimate: 96h > > java.lang.ArrayIndexOutOfBoundsException -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2202) Introduce local dictionary encoding for dimensions
[ https://issues.apache.org/jira/browse/CARBONDATA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2202. -- Resolution: Fixed Fix Version/s: 1.4.1 1.4.1 introduced local-dictionary for this > Introduce local dictionary encoding for dimensions > -- > > Key: CARBONDATA-2202 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2202 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.1 > > > Currently Carbondata will generate global dictionary for columns with > 'dictionary_include' attribute. > A dimension column without that attribute will only be stored after some > simple compression. These columns can also be dictionary encoded in file > level (called ‘local dictionary’) to reduce data size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2166) Default value of cutoff timestamp is wrong
[ https://issues.apache.org/jira/browse/CARBONDATA-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin closed CARBONDATA-2166. -- Resolution: Not A Problem > Default value of cutoff timestamp is wrong > -- > > Key: CARBONDATA-2166 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2166 > Project: CarbonData > Issue Type: Bug >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > In the configuration-parameters.md, it says that the default value of > `carbon.cutoffTimestamp` is `1970-01-01 05:30:00`. But actually in > `TimeStampDirectDictionaryGenerator` it use empty as default value. > > As a result, some tests in module `SDVTests` ran failed in my local machine. > For example, testcase of `BadRecord_Dataload_006` ran failed in maven but it > ran successfully in IDE. > > Besides, the TimeZone should also be set accordingly to make the tests right. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap
xuchuanyin created CARBONDATA-2820: -- Summary: Block rebuilding for preagg, bloom and lucene datamap Key: CARBONDATA-2820 URL: https://issues.apache.org/jira/browse/CARBONDATA-2820 Project: CarbonData Issue Type: Improvement Reporter: xuchuanyin Assignee: xuchuanyin currently we will block rebuilding these datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)