[jira] [Updated] (CARBONDATA-2758) selection on local dictionary fails when column having all null values more than default batch size.
[ https://issues.apache.org/jira/browse/CARBONDATA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2758: -- Description: ArrayIndexOutOfBound throws on following command. 1. create table t1(s1 int,s2 string,s3 string) stored by 'carbondata' TBLPROPERTIES('SORT_SCOPE'='BATCH_SORT') 2. load from a csv having all null values alteast 4097 rows or insert into t1 select cast(null as int),cast(null as string),cast(null as string) 5000 times 3. select * from t1; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 207, BLR114267, executor 1): java.lang.ArrayIndexOutOfBoundsException: 4096 at org.apache.carbondata.spark.vectorreader.ColumnarVectorWrapper.putNull(ColumnarVectorWrapper.java:181) at org.apache.carbondata.core.datastore.chunk.store.impl.LocalDictDimensionDataChunkStore.fillRow(LocalDictDimensionDataChunkStore.java:63) at org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.fillVector(VariableLengthDimensionColumnPage.java:117) at org.apache.carbondata.core.scan.result.BlockletScannedResult.fillColumnarNoDictionaryBatch(BlockletScannedResult.java:260) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.fillResultToColumnarBatch(DictionaryBasedVectorResultCollector.java:166) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectResultInColumnarBatch(DictionaryBasedVectorResultCollector.java:157) at org.apache.carbondata.core.scan.processor.DataBlockIterator.processNextBatch(DataBlockIterator.java:245) at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:48) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:307) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:182) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:497) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:381) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) was: ArrayIndexOutOfBound throws on following command. 1. create table t1(s1 int,s2 string,s3 string) stored by 'carbondata' TBLPROPERTIES('SORT_SCOPE'='BATCH_SORT') 2. load from a csv having all null values or insert into t1 select cast(null as int),cast(null as string),cast(null as string) 5000 times 3. select * from t1; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 207, BLR114267, executor 1): java.lang.ArrayIndexOutOfBoundsException: 4096 at org.apache.carbondata.spark.vectorreader.ColumnarVectorWrapper.putNull(ColumnarVectorWrapper.java:181) at org.apache.carbondata.core.datastore.chunk.store.impl.LocalDictDimensionDataChunkStore.fillRow(LocalDictDimensionDataChunkStore.java:63) at org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.fillVector(VariableLengthDimensionColumnPage.java:117) at org.apache.carbondata.core.scan.result.BlockletScannedResult.fillColumnarNoDictionaryBatch(BlockletScannedResult.java:260) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.fillResultToColumnarBatch(DictionaryBasedVectorResultCollector.java:166) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectResultInColumnarBatch(DictionaryBasedVectorResultCollector.java:157) at org.apache.carbondata.core.scan.processor.DataBlockIterator.processNextBatch(DataBlockIterator.java:245) at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:48) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:307) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:182) at
[jira] [Created] (CARBONDATA-2758) selection on local dictionary fails when column having all null values more than default batch size.
Jatin created CARBONDATA-2758: - Summary: selection on local dictionary fails when column having all null values more than default batch size. Key: CARBONDATA-2758 URL: https://issues.apache.org/jira/browse/CARBONDATA-2758 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.5.0 Environment: 3 node cluster having spark-2.2 Reporter: Jatin Assignee: Jatin Fix For: 1.5.0 ArrayIndexOutOfBound throws on following command. 1. create table t1(s1 int,s2 string,s3 string) stored by 'carbondata' TBLPROPERTIES('SORT_SCOPE'='BATCH_SORT') 2. load from a csv having all null values or insert into t1 select cast(null as int),cast(null as string),cast(null as string) 5000 times 3. select * from t1; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 207, BLR114267, executor 1): java.lang.ArrayIndexOutOfBoundsException: 4096 at org.apache.carbondata.spark.vectorreader.ColumnarVectorWrapper.putNull(ColumnarVectorWrapper.java:181) at org.apache.carbondata.core.datastore.chunk.store.impl.LocalDictDimensionDataChunkStore.fillRow(LocalDictDimensionDataChunkStore.java:63) at org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.fillVector(VariableLengthDimensionColumnPage.java:117) at org.apache.carbondata.core.scan.result.BlockletScannedResult.fillColumnarNoDictionaryBatch(BlockletScannedResult.java:260) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.fillResultToColumnarBatch(DictionaryBasedVectorResultCollector.java:166) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectResultInColumnarBatch(DictionaryBasedVectorResultCollector.java:157) at org.apache.carbondata.core.scan.processor.DataBlockIterator.processNextBatch(DataBlockIterator.java:245) at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:48) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:307) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:182) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:497) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:381) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2741) Exception occurs after alter add few columns and selecting in random order
Jatin created CARBONDATA-2741: - Summary: Exception occurs after alter add few columns and selecting in random order Key: CARBONDATA-2741 URL: https://issues.apache.org/jira/browse/CARBONDATA-2741 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.5.0 Environment: 3 node cluster with spark2.2 Reporter: Jatin Assignee: Jatin Fix For: 1.5.0 create table tb1 (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1','COLUMN_META_CACHE'='AMSize'); LOAD DATA INPATH 'hdfs://hacluster/csv/vardhandaterestruct.csv' INTO TABLE tb1 OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); alter table tb1 add columns(age int, name string); select * from tb1 where name is NULL or channelsId =4; Exception occurs : *Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6508.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6508.0 (TID 140476, linux-49, executor 3): java.lang.RuntimeException: internal error:* org.apache.carbondata.core.datastore.page.encoding.adaptive.AdaptiveFloatingCodec[src type: DOUBLE, target type: INT, stats(min: 1.0, max: 100.0, decimal: 1 )] at org.apache.carbondata.core.datastore.page.encoding.adaptive.AdaptiveFloatingCodec$3.decodeLong(AdaptiveFloatingCodec.java:185) at org.apache.carbondata.core.datastore.page.LazyColumnPage.getLong(LazyColumnPage.java:64) at org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor$IntegralMeasureVectorFiller.fillMeasureVector(MeasureDataVectorProcessor.java:73) at org.apache.carbondata.core.scan.result.impl.FilterQueryScannedResult.fillColumnarMeasureBatch(FilterQueryScannedResult.java:129) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.fillResultToColumnarBatch(DictionaryBasedVectorResultCollector.java:167) at org.apache.carbondata.core.scan.collector.impl.RestructureBasedVectorResultCollector.collectResultInColumnarBatch(RestructureBasedVectorResultCollector.java:127) at org.apache.carbondata.core.scan.processor.DataBlockIterator.processNextBatch(DataBlockIterator.java:245) at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:48) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:290) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:180) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:497) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:381) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:381) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:828) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:828) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2610) DataMap creation fails on null values
[ https://issues.apache.org/jira/browse/CARBONDATA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2610: -- Description: # Create a table # load data in table having null values. # Create datamap on table. Exception Details 18/06/13 23:23:52 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4) java.lang.NullPointerException at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:130) at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:128) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:128) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:122) at org.apache.carbondata.hadoop.CarbonRecordReader.getCurrentValue(CarbonRecordReader.java:108) at org.apache.carbondata.datamap.IndexDataMapRebuildRDD.internalCompute(IndexDataMapRebuildRDD.scala:194) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:76) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/06/13 23:23:52 ERROR TaskSetManager: Task 0 in stage 4.0 failed 1 times; aborting job was: # Create a table # Create datamap on table # load data in table having null values. Exception Details 18/06/13 23:23:52 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4) java.lang.NullPointerException at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:130) at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:128) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:128) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:122) at org.apache.carbondata.hadoop.CarbonRecordReader.getCurrentValue(CarbonRecordReader.java:108) at org.apache.carbondata.datamap.IndexDataMapRebuildRDD.internalCompute(IndexDataMapRebuildRDD.scala:194) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:76) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/06/13 23:23:52 ERROR TaskSetManager: Task 0 in stage 4.0 failed 1 times; aborting job > DataMap creation fails on null values > -- > > Key: CARBONDATA-2610 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2610 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.4.0 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Fix For: 1.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > # Create a table > # load data in table having null values. > # Create datamap on table. > Exception Details > 18/06/13 23:23:52 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4) > java.lang.NullPointerException > at > org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:130) > at > org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:128) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:128) > at > org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:122) > at >
[jira] [Created] (CARBONDATA-2610) DataMap creation fails on null values
Jatin created CARBONDATA-2610: - Summary: DataMap creation fails on null values Key: CARBONDATA-2610 URL: https://issues.apache.org/jira/browse/CARBONDATA-2610 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.4.0 Reporter: Jatin Assignee: Jatin Fix For: 1.5.0 # Create a table # Create datamap on table # load data in table having null values. Exception Details 18/06/13 23:23:52 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4) java.lang.NullPointerException at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:130) at org.apache.carbondata.datamap.OriginalReadSupport$$anonfun$readRow$1.apply(IndexDataMapRebuildRDD.scala:128) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:128) at org.apache.carbondata.datamap.OriginalReadSupport.readRow(IndexDataMapRebuildRDD.scala:122) at org.apache.carbondata.hadoop.CarbonRecordReader.getCurrentValue(CarbonRecordReader.java:108) at org.apache.carbondata.datamap.IndexDataMapRebuildRDD.internalCompute(IndexDataMapRebuildRDD.scala:194) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:76) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/06/13 23:23:52 ERROR TaskSetManager: Task 0 in stage 4.0 failed 1 times; aborting job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2321) Selecton after a Concurrent Load Failing for Partition columns
Jatin created CARBONDATA-2321: - Summary: Selecton after a Concurrent Load Failing for Partition columns Key: CARBONDATA-2321 URL: https://issues.apache.org/jira/browse/CARBONDATA-2321 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 1.4.0 Environment: Spark-2.1 Reporter: Jatin Assignee: Jatin Fix For: 1.4.0 selection after a Concurrent load fails randomly for partition column. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2277) Filter on default values are not working
[ https://issues.apache.org/jira/browse/CARBONDATA-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-2277: - Assignee: Jatin > Filter on default values are not working > > > Key: CARBONDATA-2277 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2277 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.4.0 > Environment: Spark-2.2 >Reporter: Jatin >Assignee: Jatin >Priority: Major > Fix For: 1.4.0 > > > 0: jdbc:hive2://localhost:1> create table testFilter(data int) stored by > 'carbondata'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.231 seconds) > 0: jdbc:hive2://localhost:1> insert into testFilter values(22); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (3.726 seconds) > 0: jdbc:hive2://localhost:1> alter table testFilter add columns(c1 int) > TBLPROPERTIES('DEFAULT.VALUE.c1' = '25'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.761 seconds) > 0: jdbc:hive2://localhost:1> select * from testFilter; > +---+-+--+ > | data | c1 | > +---+-+--+ > | 22 | 25 | > +---+-+--+ > 1 row selected (0.85 seconds) > 0: jdbc:hive2://localhost:1> select * from testFilter where c1=25; > Error: java.nio.BufferUnderflowException (state=,code=0) > Stack Trace : > 18/03/25 13:34:08 INFO CarbonLateDecodeRule: pool-20-thread-8 skip > CarbonOptimizer > 18/03/25 13:34:08 INFO CarbonLateDecodeRule: pool-20-thread-8 Skip > CarbonOptimizer > 18/03/25 13:34:08 INFO TableInfo: pool-20-thread-8 Table block size not > specified for default_testfilter. Therefore considering the default value > 1024 MB > 18/03/25 13:34:08 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:506) > at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:412) > at > org.apache.carbondata.core.util.DataTypeUtil.getMeasureObjectFromDataType(DataTypeUtil.java:117) > at > org.apache.carbondata.core.scan.filter.executer.RestructureEvaluatorImpl.isMeasureDefaultValuePresentInFilterValues(RestructureEvaluatorImpl.java:113) > at > org.apache.carbondata.core.scan.filter.executer.RestructureExcludeFilterExecutorImpl.(RestructureExcludeFilterExecutorImpl.java:43) > at > org.apache.carbondata.core.scan.filter.FilterUtil.getExcludeFilterExecuter(FilterUtil.java:281) > at > org.apache.carbondata.core.scan.filter.FilterUtil.createFilterExecuterTree(FilterUtil.java:147) > at > org.apache.carbondata.core.scan.filter.FilterUtil.createFilterExecuterTree(FilterUtil.java:158) > at > org.apache.carbondata.core.scan.filter.FilterUtil.getFilterExecuterTree(FilterUtil.java:1296) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:644) > at > org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:685) > at > org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:74) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:739) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:666) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:426) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:96) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at >
[jira] [Created] (CARBONDATA-2277) Filter on default values are not working
Jatin created CARBONDATA-2277: - Summary: Filter on default values are not working Key: CARBONDATA-2277 URL: https://issues.apache.org/jira/browse/CARBONDATA-2277 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 1.4.0 Environment: Spark-2.2 Reporter: Jatin Fix For: 1.4.0 0: jdbc:hive2://localhost:1> create table testFilter(data int) stored by 'carbondata'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.231 seconds) 0: jdbc:hive2://localhost:1> insert into testFilter values(22); +-+--+ | Result | +-+--+ +-+--+ No rows selected (3.726 seconds) 0: jdbc:hive2://localhost:1> alter table testFilter add columns(c1 int) TBLPROPERTIES('DEFAULT.VALUE.c1' = '25'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.761 seconds) 0: jdbc:hive2://localhost:1> select * from testFilter; +---+-+--+ | data | c1 | +---+-+--+ | 22 | 25 | +---+-+--+ 1 row selected (0.85 seconds) 0: jdbc:hive2://localhost:1> select * from testFilter where c1=25; Error: java.nio.BufferUnderflowException (state=,code=0) Stack Trace : 18/03/25 13:34:08 INFO CarbonLateDecodeRule: pool-20-thread-8 skip CarbonOptimizer 18/03/25 13:34:08 INFO CarbonLateDecodeRule: pool-20-thread-8 Skip CarbonOptimizer 18/03/25 13:34:08 INFO TableInfo: pool-20-thread-8 Table block size not specified for default_testfilter. Therefore considering the default value 1024 MB 18/03/25 13:34:08 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:506) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:412) at org.apache.carbondata.core.util.DataTypeUtil.getMeasureObjectFromDataType(DataTypeUtil.java:117) at org.apache.carbondata.core.scan.filter.executer.RestructureEvaluatorImpl.isMeasureDefaultValuePresentInFilterValues(RestructureEvaluatorImpl.java:113) at org.apache.carbondata.core.scan.filter.executer.RestructureExcludeFilterExecutorImpl.(RestructureExcludeFilterExecutorImpl.java:43) at org.apache.carbondata.core.scan.filter.FilterUtil.getExcludeFilterExecuter(FilterUtil.java:281) at org.apache.carbondata.core.scan.filter.FilterUtil.createFilterExecuterTree(FilterUtil.java:147) at org.apache.carbondata.core.scan.filter.FilterUtil.createFilterExecuterTree(FilterUtil.java:158) at org.apache.carbondata.core.scan.filter.FilterUtil.getFilterExecuterTree(FilterUtil.java:1296) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:644) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:685) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:74) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:739) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:666) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:426) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:96) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.collect(RDD.scala:934) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at
[jira] [Updated] (CARBONDATA-2251) Refactored sdv failures running on different environment
[ https://issues.apache.org/jira/browse/CARBONDATA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2251: -- Description: # MergeIndex testcase in sdv fails if executed with different number of executors or in standalone spark. # Changes testcase having Hive UDAF like histogram_numeric having unexpected behaviour. so recommended way to write testcase using aggregation. was: # MergeIndex testcase in sdv fails if executed with different number of executors or in standalone spark. # Changes testcase having Hive UDAF like histogram_numeric having unexpected behaviour. so recommended way to write testcase using aggregation. > Refactored sdv failures running on different environment > > > Key: CARBONDATA-2251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 > Project: CarbonData > Issue Type: Improvement > Components: test >Affects Versions: 1.3.1 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Attachments: h2.PNG, hi.PNG > > > # MergeIndex testcase in sdv fails if executed with different number of > executors or in standalone spark. > # Changes testcase having Hive UDAF like histogram_numeric having > unexpected behaviour. so recommended way to write testcase using aggregation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2251) Refactored sdv failures running on different environment
[ https://issues.apache.org/jira/browse/CARBONDATA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2251: -- Attachment: hi.PNG h2.PNG > Refactored sdv failures running on different environment > > > Key: CARBONDATA-2251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 > Project: CarbonData > Issue Type: Improvement > Components: test >Affects Versions: 1.3.1 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Attachments: h2.PNG, hi.PNG > > > # MergeIndex testcase in sdv fails if executed with different number of > executors or in standalone spark. > # Changes testcase having Hive UDAF like histogram_numeric having > unexpected behaviour. so recommended way to write testcase using aggregation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2251) Refactored sdv failures running on different environment
[ https://issues.apache.org/jira/browse/CARBONDATA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2251: -- Description: # MergeIndex testcase in sdv fails if executed with different number of executors or in standalone spark. # Changes testcase having Hive UDAF like histogram_numeric having unexpected behaviour. so recommended way to write testcase using aggregation. was: # MergeIndex testcase in sdv fails if executed with different number of executors or in standalone spark. # Changes Hive UDAF like histogram_numeric having unexpected behaviour. so recommended way to write testcase using aggregation. > Refactored sdv failures running on different environment > > > Key: CARBONDATA-2251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 > Project: CarbonData > Issue Type: Improvement > Components: test >Affects Versions: 1.3.1 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > > # MergeIndex testcase in sdv fails if executed with different number of > executors or in standalone spark. > # Changes testcase having Hive UDAF like histogram_numeric having > unexpected behaviour. so recommended way to write testcase using aggregation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2251) Refactored sdv failures running on different environment
[ https://issues.apache.org/jira/browse/CARBONDATA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2251: -- Priority: Minor (was: Trivial) > Refactored sdv failures running on different environment > > > Key: CARBONDATA-2251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 > Project: CarbonData > Issue Type: Improvement > Components: test >Affects Versions: 1.3.1 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > > # MergeIndex testcase in sdv fails if executed with different number of > executors or in standalone spark. > # Changes Hive UDAF like histogram_numeric having unexpected behaviour. so > recommended way to write testcase using aggregation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2251) Refactored sdv failures running on different environment
[ https://issues.apache.org/jira/browse/CARBONDATA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2251: -- Priority: Trivial (was: Major) > Refactored sdv failures running on different environment > > > Key: CARBONDATA-2251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 > Project: CarbonData > Issue Type: Improvement > Components: test >Affects Versions: 1.3.1 >Reporter: Jatin >Assignee: Jatin >Priority: Trivial > > # MergeIndex testcase in sdv fails if executed with different number of > executors or in standalone spark. > # Changes Hive UDAF like histogram_numeric having unexpected behaviour. so > recommended way to write testcase using aggregation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2251) Refactored sdv failures running on different environment
Jatin created CARBONDATA-2251: - Summary: Refactored sdv failures running on different environment Key: CARBONDATA-2251 URL: https://issues.apache.org/jira/browse/CARBONDATA-2251 Project: CarbonData Issue Type: Improvement Components: test Affects Versions: 1.3.1 Reporter: Jatin Assignee: Jatin # MergeIndex testcase in sdv fails if executed with different number of executors or in standalone spark. # Changes Hive UDAF like histogram_numeric having unexpected behaviour. so recommended way to write testcase using aggregation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2207) TestCase Fails using Hive Metastore
Jatin created CARBONDATA-2207: - Summary: TestCase Fails using Hive Metastore Key: CARBONDATA-2207 URL: https://issues.apache.org/jira/browse/CARBONDATA-2207 Project: CarbonData Issue Type: Bug Affects Versions: 1.4.0 Reporter: Jatin Assignee: Jatin Fix For: 1.4.0 Run All the Cabon TestCases using hive metastore out of which some test cases were failing because of not getting carbon table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2199) Exception occurs when change the datatype of measure having sort_column
[ https://issues.apache.org/jira/browse/CARBONDATA-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2199: -- Priority: Minor (was: Major) > Exception occurs when change the datatype of measure having sort_column > --- > > Key: CARBONDATA-2199 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2199 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.4.0 > Environment: spark 2.1 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Fix For: 1.4.0 > > > Use a measure columns in sort_column and change the datatype of that columns > Steps to replicate > CREATE TABLE non_partitiontable7(id Int,vin String,phonenumber Long,area > String,salary Int, country String,logdate date)STORED BY > 'org.apache.carbondata.format'TBLPROPERTIES('SORT_COLUMNS'='id,vin','sort_scope'='global_sort'); > insert into non_partitiontable7 select > 1,'A42151477823',125371344,'OutSpace',1,'China','2017-02-12'; > insert into non_partitiontable7 select > 1,'Y42151477823',125371344,'midasia',1,'China','2017-02-13'; > insert into non_partitiontable7 select > 1,'B42151477823',125371346,'OutSpace',1,'US','2018-02-12'; > insert into non_partitiontable7 select > 1,'C42151477823',125371348,'InnerSpace',10001,'UK','2019-02-12'; > select * from non_partitiontable7; > alter table non_partitiontable7 add columns (c1 int); > select * from non_partitiontable7; > alter table non_partitiontable7 change id id bigint; > select * from non_partitiontable7; > Exception StackTrace > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 16.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 16.0 (TID 80, BLR123654, executor 3): > java.lang.IllegalArgumentException: Wrong length: 4, expected 8 > at > org.apache.carbondata.core.util.ByteUtil.explainWrongLengthOrOffset(ByteUtil.java:581) > at org.apache.carbondata.core.util.ByteUtil.toLong(ByteUtil.java:553) > at > org.apache.carbondata.core.util.DataTypeUtil.getDataBasedOnRestructuredDataType(DataTypeUtil.java:847) > at > org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimesionDataChunkStore.fillRow(UnsafeVariableLengthDimesionDataChunkStore.java:181) > at > org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionDataChunk.fillConvertedChunkData(VariableLengthDimensionDataChunk.java:112) > at > org.apache.carbondata.core.scan.result.AbstractScannedResult.fillColumnarNoDictionaryBatch(AbstractScannedResult.java:256) > at > org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.scanAndFillResult(DictionaryBasedVectorResultCollector.java:163) > at > org.apache.carbondata.core.scan.collector.impl.RestructureBasedVectorResultCollector.collectVectorBatch(RestructureBasedVectorResultCollector.java:128) > at > org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:65) > at > org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:402) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at
[jira] [Created] (CARBONDATA-2199) Exception occurs when change the datatype of measure having sort_column
Jatin created CARBONDATA-2199: - Summary: Exception occurs when change the datatype of measure having sort_column Key: CARBONDATA-2199 URL: https://issues.apache.org/jira/browse/CARBONDATA-2199 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.4.0 Environment: spark 2.1 Reporter: Jatin Assignee: Jatin Fix For: 1.4.0 Use a measure columns in sort_column and change the datatype of that columns Steps to replicate CREATE TABLE non_partitiontable7(id Int,vin String,phonenumber Long,area String,salary Int, country String,logdate date)STORED BY 'org.apache.carbondata.format'TBLPROPERTIES('SORT_COLUMNS'='id,vin','sort_scope'='global_sort'); insert into non_partitiontable7 select 1,'A42151477823',125371344,'OutSpace',1,'China','2017-02-12'; insert into non_partitiontable7 select 1,'Y42151477823',125371344,'midasia',1,'China','2017-02-13'; insert into non_partitiontable7 select 1,'B42151477823',125371346,'OutSpace',1,'US','2018-02-12'; insert into non_partitiontable7 select 1,'C42151477823',125371348,'InnerSpace',10001,'UK','2019-02-12'; select * from non_partitiontable7; alter table non_partitiontable7 add columns (c1 int); select * from non_partitiontable7; alter table non_partitiontable7 change id id bigint; select * from non_partitiontable7; Exception StackTrace Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 16.0 failed 4 times, most recent failure: Lost task 1.3 in stage 16.0 (TID 80, BLR123654, executor 3): java.lang.IllegalArgumentException: Wrong length: 4, expected 8 at org.apache.carbondata.core.util.ByteUtil.explainWrongLengthOrOffset(ByteUtil.java:581) at org.apache.carbondata.core.util.ByteUtil.toLong(ByteUtil.java:553) at org.apache.carbondata.core.util.DataTypeUtil.getDataBasedOnRestructuredDataType(DataTypeUtil.java:847) at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimesionDataChunkStore.fillRow(UnsafeVariableLengthDimesionDataChunkStore.java:181) at org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionDataChunk.fillConvertedChunkData(VariableLengthDimensionDataChunk.java:112) at org.apache.carbondata.core.scan.result.AbstractScannedResult.fillColumnarNoDictionaryBatch(AbstractScannedResult.java:256) at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.scanAndFillResult(DictionaryBasedVectorResultCollector.java:163) at org.apache.carbondata.core.scan.collector.impl.RestructureBasedVectorResultCollector.collectVectorBatch(RestructureBasedVectorResultCollector.java:128) at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:65) at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:402) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2136) Exception displays while loading data with BAD_RECORDS_ACTION = REDIRECT
[ https://issues.apache.org/jira/browse/CARBONDATA-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-2136: - Assignee: Jatin > Exception displays while loading data with BAD_RECORDS_ACTION = REDIRECT > > > Key: CARBONDATA-2136 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2136 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: Jatin >Priority: Major > Attachments: 2000_UniqData.csv > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Exception displays while loading data with BAD_RECORDS_ACTION = REDIRECT > Steps to reproduce: > 1) create the table: > CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1',"TABLE_BLOCKSIZE"= > "256 > MB",'SORT_SCOPE'='NO_SORT','NO_INVERTED_INDEX'='CUST_ID,CUST_NAME,Double_COLUMN1,DECIMAL_COLUMN2'); > 2) Load Data: > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='REDIRECT','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > Expected Result: data should be loaded successfully. > Actual Result: > Error: java.lang.Exception: DataLoad failure: There is an unexpected error: > unable to generate the mdkey (state=,code=0) > > 3) ThriftServer logs: > 18/02/06 16:38:11 INFO SparkExecuteStatementOperation: Running query 'LOAD > DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='REDIRECT','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')' > with 87eb4af5-e485-4a0b-bcae-6589f1252291 > 18/02/06 16:38:11 INFO CarbonSparkSqlParser: Parsing command: LOAD DATA > INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='REDIRECT','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 18/02/06 16:38:11 INFO CarbonLateDecodeRule: pool-23-thread-41 skip > CarbonOptimizer > 18/02/06 16:38:11 INFO CarbonLateDecodeRule: pool-23-thread-41 Skip > CarbonOptimizer > 18/02/06 16:38:11 INFO HiveMetaStore: 42: get_table : db=bug tbl=uniqdata > 18/02/06 16:38:11 INFO audit: ugi=hduser ip=unknown-ip-addr cmd=get_table : > db=bug tbl=uniqdata > 18/02/06 16:38:11 INFO HiveMetaStore: 42: Opening raw store with > implemenation class:org.apache.hadoop.hive.metastore.ObjectStore > 18/02/06 16:38:11 INFO ObjectStore: ObjectStore, initialize called > 18/02/06 16:38:11 INFO Query: Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is > closing > 18/02/06 16:38:11 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is > DERBY > 18/02/06 16:38:11 INFO ObjectStore: Initialized ObjectStore > 18/02/06 16:38:11 INFO CatalystSqlParser: Parsing command: array > 18/02/06 16:38:11 INFO CarbonLoadDataCommand: pool-23-thread-41 Deleting > stale folders if present for table bug.uniqdata > 18/02/06 16:38:11 INFO CarbonLoadDataCommand: pool-23-thread-41 Initiating > Direct Load for the Table : (bug.uniqdata) > 18/02/06 16:38:12 INFO HdfsFileLock: pool-23-thread-41 HDFS lock > path:hdfs://localhost:54310/opt/prestocarbonStore/bug/uniqdata/tablestatus.lock > 18/02/06 16:38:12 INFO HdfsFileLock: pool-23-thread-41 HDFS lock > path:hdfs://localhost:54310/opt/prestocarbonStore/bug/uniqdata/Segment_1.lock > 18/02/06 16:38:12 INFO DeleteLoadFolders: pool-23-thread-41 Info: Deleted the > load 1 > 18/02/06 16:38:12 INFO DeleteLoadFolders: pool-23-thread-41 Info: Segment > lock on segment:1 is released > 18/02/06 16:38:12 INFO DataLoadingUtil$: pool-23-thread-41 Table status lock > has been successfully acquired. > 18/02/06 16:38:12 INFO HdfsFileLock: pool-23-thread-41 Deleted the lock file >
[jira] [Created] (CARBONDATA-2122) Redirect Bad Record Path Should Throw Exception on Empty Location
Jatin created CARBONDATA-2122: - Summary: Redirect Bad Record Path Should Throw Exception on Empty Location Key: CARBONDATA-2122 URL: https://issues.apache.org/jira/browse/CARBONDATA-2122 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.0 Environment: Spark-2.1 Reporter: Jatin Assignee: Jatin Fix For: 1.3.0 Data Load having bad record redirect with empty location should throw exception of Invalid Path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2080) Hadoop Conf not propagated from driver to executor in S3
[ https://issues.apache.org/jira/browse/CARBONDATA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-2080: -- Environment: Spark 2.1, Hadoop 2.7.2 with 3 node cluster using Mesos > Hadoop Conf not propagated from driver to executor in S3 > > > Key: CARBONDATA-2080 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2080 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.4.0 > Environment: Spark 2.1, Hadoop 2.7.2 with 3 node cluster using Mesos >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Fix For: 1.4.0 > > > On loading data in distributed environment using S3 as location. The load > fails because of not getting hadoop conf on executors. > Logs Info : > 18/01/24 07:38:20 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 7, > hadoop-slave-1, executor 1): com.amazonaws.AmazonClientException: Unable to > load AWS credentials from any provider in the chain > at > com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) > at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) > at > com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) > at > com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:67) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:59) > at > org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.(HDFSCarbonFile.java:42) > at > org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:47) > at > org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:86) > at > org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore.getCarbonIndexFiles(SegmentIndexFileStore.java:204) > at > org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:52) > at > org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:119) > at > org.apache.carbondata.spark.rdd.CarbonMergeFilesRDD$$anon$1.(CarbonMergeFilesRDD.scala:58) > at > org.apache.carbondata.spark.rdd.CarbonMergeFilesRDD.internalCompute(CarbonMergeFilesRDD.scala:53) > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > !https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2080) Hadoop Conf not propagated from driver to executor in S3
Jatin created CARBONDATA-2080: - Summary: Hadoop Conf not propagated from driver to executor in S3 Key: CARBONDATA-2080 URL: https://issues.apache.org/jira/browse/CARBONDATA-2080 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.4.0 Reporter: Jatin Assignee: Jatin Fix For: 1.4.0 On loading data in distributed environment using S3 as location. The load fails because of not getting hadoop conf on executors. Logs Info : 18/01/24 07:38:20 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 7, hadoop-slave-1, executor 1): com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:67) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:59) at org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.(HDFSCarbonFile.java:42) at org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:47) at org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:86) at org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore.getCarbonIndexFiles(SegmentIndexFileStore.java:204) at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:52) at org.apache.carbondata.core.writer.CarbonIndexFileMergeWriter.mergeCarbonIndexFilesOfSegment(CarbonIndexFileMergeWriter.java:119) at org.apache.carbondata.spark.rdd.CarbonMergeFilesRDD$$anon$1.(CarbonMergeFilesRDD.scala:58) at org.apache.carbondata.spark.rdd.CarbonMergeFilesRDD.internalCompute(CarbonMergeFilesRDD.scala:53) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) !https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-1735) Carbon1.3.0 Load: Segment created during load is not marked for delete if beeline session is closed while load is still in progress
[ https://issues.apache.org/jira/browse/CARBONDATA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317808#comment-16317808 ] Jatin commented on CARBONDATA-1735: --- I am unable to replicate the issue whenever I close the beeline it shows the InterruptedException and global dictionary generation failed to have the status of a segment was marked for delete. > Carbon1.3.0 Load: Segment created during load is not marked for delete if > beeline session is closed while load is still in progress > > > Key: CARBONDATA-1735 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1735 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: 3 Node ant cluster >Reporter: Ajeet Rai >Priority: Minor > Labels: DFX > > Load: Segment created during load is not marked for delete if beeline session > is closed while load is still in progress. > Steps: > 1: Create a table with dictionary include > 2: Start a load job > 3: close the beeline session when global dictionary generation job is still > in progress. > 4: Observe that global dictionary generation job is completed but next job is > not triggered. > 5: Also observe that table status file is not updated and status of job is > still in progress. > 6: show segment will show this segment with status as in progress. > Expected behaviour: Either job should be completed or load should fail and > segment should be marked for delete. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-2003) Streaming table is not updated on second streaming load
[ https://issues.apache.org/jira/browse/CARBONDATA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-2003: - Assignee: Jatin > Streaming table is not updated on second streaming load > --- > > Key: CARBONDATA-2003 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2003 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark2.1 >Reporter: Geetika Gupta >Assignee: Jatin > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > > I tried the following scenario on spark shell: > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > val carbon = SparkSession.builder().config(sc.getConf) > .getOrCreateCarbonSession("hdfs://localhost:54311/newCarbonStore","/tmp") > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, > "FORCE") > carbon.sql("CREATE TABLE uniqdata_stream_8(CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB', 'streaming'='true')") > import carbon.sqlContext.implicits._ > val uniqdataSch = StructType( > Array(StructField("CUST_ID", IntegerType),StructField("CUST_NAME", > StringType),StructField("ACTIVE_EMUI_VERSION", StringType),StructField("DOB", > TimestampType), StructField("DOJ", TimestampType), > StructField("BIGINT_COLUMN1", LongType), StructField("BIGINT_COLUMN2", > LongType), StructField("DECIMAL_COLUMN1", > org.apache.spark.sql.types.DecimalType(30, 10)), > StructField("DECIMAL_COLUMN2", > org.apache.spark.sql.types.DecimalType(36,10)), StructField("Double_COLUMN1", > DoubleType), StructField("Double_COLUMN2", DoubleType), > StructField("INTEGER_COLUMN1", IntegerType))) > val streamDf = carbon.readStream > .schema(uniqdataSch) > .option("sep", ",") > .csv("file:///home/geetika/Downloads/uniqdata") > val dfToWrite = streamDf.map{x => x.get(0) + "," + x.get(1) + "," + x.get(2)+ > "," + x.get(3)+ "," + x.get(4)+ "," + x.get(5)+ "," + x.get(6)+ "," + > x.get(7)+ "," + x.get(8)+ "," + x.get(9)+ "," + x.get(10)+ "," + x.get(11)} > val qry = > dfToWrite.writeStream.format("carbondata").trigger(ProcessingTime("5 > seconds")) > .option("checkpointLocation","/stream/uniq8") > .option("dbName", "default") > .option("tableName", "uniqdata_stream_8") > .start() > qry.awaitTermination() > Now close this shell and check the record count on the table using : > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val carbon = SparkSession.builder().config(sc.getConf) > .getOrCreateCarbonSession("hdfs://localhost:54311/newCarbonStore","/tmp") > carbon.sql("select count(*) from uniqdata_stream_8").show > OUTPUT: > scala> carbon.sql("select count(*) from uniqdata_stream_8").show > 18/01/08 15:51:53 ERROR CarbonProperties: Executor task launch worker-0 > Configured value for property carbon.number.of.cores.while.loading is wrong. > Falling back to the default value 2 > ++ > |count(1)| > ++ > |2013| > ++ > Again try the above scenario and check the count. It remains same after the > second streaming load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1032) NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration
[ https://issues.apache.org/jira/browse/CARBONDATA-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316254#comment-16316254 ] Jatin commented on CARBONDATA-1032: --- Please provide commands for create & load table along with the input data csv. > NumberFormatException and NegativeArraySizeException for select with in > clause filter limit for unsafe true configuration > - > > Key: CARBONDATA-1032 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1032 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.1.0 > Environment: 3 node cluster SUSE 11 SP4 >Reporter: Chetan Bhat > Original Estimate: 504h > Remaining Estimate: 504h > > Carbon .properties are configured as below: > carbon.allowed.compaction.days = 2 > carbon.enable.auto.load.merge = false > carbon.compaction.level.threshold = 3,2 > carbon.timestamp.format = -MM-dd > carbon.badRecords.location = /tmp/carbon > carbon.numberof.preserve.segments = 2 > carbon.sort.file.buffer.size = 20 > max.query.execution.time = 60 > carbon.number.of.cores.while.loading = 8 > carbon.storelocation =hdfs://hacluster/opt/CarbonStore > enable.data.loading.statistics = true > enable.unsafe.sort = true > offheap.sort.chunk.size.inmb = 128 > sort.inmemory.size.inmb = 30720 > carbon.enable.vector.reader=true > enable.unsafe.in.query.processing=true > enable.query.statistics=true > carbon.blockletgroup.size.in.mb=128 > high.cardinality.identify.enable=TRUE > high.cardinality.threshold=1 > high.cardinality.value=1000 > high.cardinality.row.count.percentage=40 > carbon.data.file.version=2 > carbon.major.compaction.size=2 > carbon.enable.auto.load.merge=FALSE > carbon.numberof.preserve.segments=1 > carbon.allowed.compaction.days=1 > User creates table, loads 1535088 records data and executes the select with > in clause filter limit. > Actual Result : > NumberFormatException and NegativeArraySizeException for select with in > clause filter limit for unsafe true configuration. > 0: jdbc:hive2://172.168.100.199:23040> select * from flow_carbon_test4 where > opp_bk in ('149199158','149199116','149199022','149199031') > and dt>='20140101' and dt <= '20160101' order by bal asc limit 1000; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 2109.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 2109.0 (TID 75120, linux-49, executor 2): > java.lang.NegativeArraySizeException > at > org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeBigDecimalMeasureChunkStore.getBigDecimal(UnsafeBigDecimalMeasureChunkStore.java:132) > at > org.apache.carbondata.core.datastore.compression.decimal.CompressByteArray.getBigDecimalValue(CompressByteArray.java:94) > at > org.apache.carbondata.core.datastore.dataholder.CarbonReadDataHolder.getReadableBigDecimalValueByIndex(CarbonReadDataHolder.java:38) > at > org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor$DecimalMeasureVectorFiller.fillMeasureVectorForFilter(MeasureDataVectorProcessor.java:253) > at > org.apache.carbondata.core.scan.result.impl.FilterQueryScannedResult.fillColumnarMeasureBatch(FilterQueryScannedResult.java:119) > at > org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.scanAndFillResult(DictionaryBasedVectorResultCollector.java:145) > at > org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectVectorBatch(DictionaryBasedVectorResultCollector.java:137) > at > org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:65) > at > org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:251) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:141) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:221) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) >
[jira] [Assigned] (CARBONDATA-1703) Difference in result set count of carbon and hive after applying select query.
[ https://issues.apache.org/jira/browse/CARBONDATA-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1703: - Assignee: Jatin > Difference in result set count of carbon and hive after applying select query. > -- > > Key: CARBONDATA-1703 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1703 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: Jatin >Priority: Minor > Attachments: 2000_UniqData.csv > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Incorrect result displays after applying select query. > Steps to reproduce: > 1) Create table stored by carbondata and load data in it: > a) CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2) Create hive table: > a) CREATE TABLE uniqdata_h (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > b) load data local inpath > '/home/knoldus/Desktop/csv/TestData/Data/uniqdata/2000_UniqData.csv' into > table uniqdata_h; > 3) Execute Query: > a) SELECT > CUST_ID,CUST_NAME,DOB,BIGINT_COLUMN1,DECIMAL_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN2 > from (select * from uniqdata) SUB_QRY WHERE (CUST_ID in > (10020,10030,10032,10035,10040,10060,NULL) or INTEGER_COLUMN1 not in > (1021,1031,1032,1033,NULL)) and (Double_COLUMN1 not in > (1.12345674897976E10,NULL) or DECIMAL_COLUMN2 in > (22345679921.123400,NULL)); > b) SELECT > CUST_ID,CUST_NAME,DOB,BIGINT_COLUMN1,DECIMAL_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN2 > from (select * from uniqdata_h) SUB_QRY WHERE (CUST_ID in > (10020,10030,10032,10035,10040,10060,NULL) or INTEGER_COLUMN1 not in > (1021,1031,1032,1033,NULL)) and (Double_COLUMN1 not in > (1.12345674897976E10,NULL) or DECIMAL_COLUMN2 in > (22345679921.123400,NULL)); > 4) Expected Result: both results should be same. > 5) Actual Result: > a) carbondata table result: > -+---+--+ > | CUST_ID |CUST_NAME | DOB | BIGINT_COLUMN1 | > DECIMAL_COLUMN1 |Double_COLUMN2 | INTEGER_COLUMN1 | > DECIMAL_COLUMN2 |Double_COLUMN2 | > +--+--++-+-+---+--+-+---+--+ > | NULL | | NULL | NULL| > NULL| NULL | NULL | NULL >| NULL | > | NULL | | NULL | 1233720368578 | > NULL| NULL | NULL | NULL >| NULL | > | NULL | | NULL | NULL| > NULL| NULL | NULL | NULL >| NULL | > | NULL | | NULL | NULL| > 12345678901.123400 | NULL | NULL | NULL >| NULL | > | NULL | | NULL | NULL| > NULL| NULL | NULL | NULL >| NULL | > | NULL | | NULL | NULL| > NULL| -1.12345674897976E10 | NULL | NULL >| -1.12345674897976E10 | > | NULL | | NULL | NULL| > NULL| NULL | 0| NULL >
[jira] [Assigned] (CARBONDATA-1963) Support S3 table with dictionary
[ https://issues.apache.org/jira/browse/CARBONDATA-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1963: - Assignee: Jatin > Support S3 table with dictionary > > > Key: CARBONDATA-1963 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1963 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1960) Add example for creating a local table and load CSV data which is stored in S3.
[ https://issues.apache.org/jira/browse/CARBONDATA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1960: - Assignee: Jatin > Add example for creating a local table and load CSV data which is stored in > S3. > --- > > Key: CARBONDATA-1960 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1960 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1962) Support alter table add columns/drop columns on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1962: - Assignee: Jatin > Support alter table add columns/drop columns on S3 table > > > Key: CARBONDATA-1962 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1962 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1961) Support data update/delete on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1961: - Assignee: Jatin > Support data update/delete on S3 table > -- > > Key: CARBONDATA-1961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1961 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1959) Support compaction on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1959: - Assignee: Jatin > Support compaction on S3 table > -- > > Key: CARBONDATA-1959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1959 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1754) Carbon1.3.0 Concurrent Insert overwrite-Compaction: Compaction job fails at run time if insert overwrite job is running concurrentlyInsert overwrite
[ https://issues.apache.org/jira/browse/CARBONDATA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1754: - Assignee: Jatin > Carbon1.3.0 Concurrent Insert overwrite-Compaction: Compaction job fails at > run time if insert overwrite job is running concurrentlyInsert overwrite > > > Key: CARBONDATA-1754 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1754 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: 3 Node ant cluster >Reporter: Ajeet Rai >Assignee: Jatin > Labels: dfx > > Carbon1.3.0 Concurrent Insert overwrite-Compaction: Compaction job fails at > run time if insert overwrite job is running concurrently. > Steps: > 1: Create a table > 2: Start three load one by one > 3: After load is completed, start insert overwrite and minor compaction > concurrently from two different session > 4: observe that both jobs are are running > 5: Observe that Insert overwrite job is success but after that compaction > fails with below exception: > | ERROR | [pool-23-thread-49] | Error running hive query: | > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:167) > org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: > Compaction failed. Please check logs for more info. Exception in compaction > java.lang.Exception: Compaction failed to update metadata for table > ajeet.flow_carbon_new999 > 7: Ideally compaction job should give error in start with message that insert > overwrite in progress. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1755) Carbon1.3.0 Concurrent Insert overwrite-update: User is able to run insert overwrite and update job concurrently.
[ https://issues.apache.org/jira/browse/CARBONDATA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1755: - Assignee: Jatin > Carbon1.3.0 Concurrent Insert overwrite-update: User is able to run insert > overwrite and update job concurrently. > - > > Key: CARBONDATA-1755 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1755 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: 3 Node ant cluster >Reporter: Ajeet Rai >Assignee: Jatin >Priority: Minor > Labels: dfx > > Carbon1.3.0 Concurrent Insert overwrite-update: User is able to run insert > overwrite and update job concurrently. > updated data will be overwritten by insert overwrite job. So there is no > meaning of running update job if insert overwrite is in progress. > Steps: > 1: Create a table > 2: Do a data load > 3: run insert overwrite job. > 4: run a update job while overwrite job is still running. > 5: Observe that update job is finished and after that overwrite job is also > finished. > 6: All previous segments are marked for delete and there is no impact of > update job. Update job will use the resources unnecessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-1775) (Carbon1.3.0 - Streaming) Select query fails with java.io.EOFException when data streaming is in progress
[ https://issues.apache.org/jira/browse/CARBONDATA-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299617#comment-16299617 ] Jatin edited comment on CARBONDATA-1775 at 12/21/17 6:32 AM: - [~chetdb] Not able to replicate with the latest jar. This issue is fixed with PR : https://github.com/apache/carbondata/pull/1621 was (Author: jatin demla): [~chetdb] Not able to replicate with the latest jar. > (Carbon1.3.0 - Streaming) Select query fails with java.io.EOFException when > data streaming is in progress > -- > > Key: CARBONDATA-1775 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1775 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: DFX > > Steps : > User starts the thrift server using the command - bin/spark-submit --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > "hdfs://hacluster/user/hive/warehouse/carbon.store" > User connects to spark shell using the command - bin/spark-shell --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --jars > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > In spark shell User creates a table and does streaming load in the table as > per the below socket streaming script. > import java.io.{File, PrintWriter} > import java.net.ServerSocket > import org.apache.spark.sql.{CarbonEnv, SparkSession} > import org.apache.spark.sql.hive.CarbonRelation > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, > "/MM/dd") > import org.apache.spark.sql.CarbonSession._ > val carbonSession = SparkSession. > builder(). > appName("StreamExample"). > getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/david") > > carbonSession.sparkContext.setLogLevel("INFO") > def sql(sql: String) = carbonSession.sql(sql) > def writeSocket(serverSocket: ServerSocket): Thread = { > val thread = new Thread() { > override def run(): Unit = { > // wait for client to connection request and accept > val clientSocket = serverSocket.accept() > val socketWriter = new PrintWriter(clientSocket.getOutputStream()) > var index = 0 > for (_ <- 1 to 1000) { > // write 5 records per iteration > for (_ <- 0 to 100) { > index = index + 1 > socketWriter.println(index.toString + ",name_" + index >+ ",city_" + index + "," + (index * > 1.00).toString + >",school_" + index + ":school_" + index + > index + "$" + index) > } > socketWriter.flush() > Thread.sleep(2000) > } > socketWriter.close() > System.out.println("Socket closed") > } > } > thread.start() > thread > } > > def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, > tableName: String, port: Int): Thread = { > val thread = new Thread() { > override def run(): Unit = { > var qry: StreamingQuery = null > try { > val readSocketDF = spark.readStream > .format("socket") > .option("host", "10.18.98.34") > .option("port", port) > .load() > qry = readSocketDF.writeStream > .format("carbondata") > .trigger(ProcessingTime("5 seconds")) > .option("checkpointLocation", tablePath.getStreamingCheckpointDir) > .option("tablePath", tablePath.getPath).option("tableName", > tableName) > .start() > qry.awaitTermination() > } catch { > case ex: Throwable => > ex.printStackTrace() > println("Done reading and writing streaming data") > } finally { > qry.stop() > } > } > } > thread.start() > thread > } > val streamTableName = "stream_table" > sql(s"CREATE TABLE $streamTableName (id INT,name STRING,city STRING,salary > FLOAT) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', > 'sort_columns'='name')") > sql(s"LOAD DATA LOCAL INPATH
[jira] [Commented] (CARBONDATA-1775) (Carbon1.3.0 - Streaming) Select query fails with java.io.EOFException when data streaming is in progress
[ https://issues.apache.org/jira/browse/CARBONDATA-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299617#comment-16299617 ] Jatin commented on CARBONDATA-1775: --- [~chetdb] Not able to replicate with the latest jar. > (Carbon1.3.0 - Streaming) Select query fails with java.io.EOFException when > data streaming is in progress > -- > > Key: CARBONDATA-1775 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1775 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: DFX > > Steps : > User starts the thrift server using the command - bin/spark-submit --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > "hdfs://hacluster/user/hive/warehouse/carbon.store" > User connects to spark shell using the command - bin/spark-shell --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --jars > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > In spark shell User creates a table and does streaming load in the table as > per the below socket streaming script. > import java.io.{File, PrintWriter} > import java.net.ServerSocket > import org.apache.spark.sql.{CarbonEnv, SparkSession} > import org.apache.spark.sql.hive.CarbonRelation > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, > "/MM/dd") > import org.apache.spark.sql.CarbonSession._ > val carbonSession = SparkSession. > builder(). > appName("StreamExample"). > getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/david") > > carbonSession.sparkContext.setLogLevel("INFO") > def sql(sql: String) = carbonSession.sql(sql) > def writeSocket(serverSocket: ServerSocket): Thread = { > val thread = new Thread() { > override def run(): Unit = { > // wait for client to connection request and accept > val clientSocket = serverSocket.accept() > val socketWriter = new PrintWriter(clientSocket.getOutputStream()) > var index = 0 > for (_ <- 1 to 1000) { > // write 5 records per iteration > for (_ <- 0 to 100) { > index = index + 1 > socketWriter.println(index.toString + ",name_" + index >+ ",city_" + index + "," + (index * > 1.00).toString + >",school_" + index + ":school_" + index + > index + "$" + index) > } > socketWriter.flush() > Thread.sleep(2000) > } > socketWriter.close() > System.out.println("Socket closed") > } > } > thread.start() > thread > } > > def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, > tableName: String, port: Int): Thread = { > val thread = new Thread() { > override def run(): Unit = { > var qry: StreamingQuery = null > try { > val readSocketDF = spark.readStream > .format("socket") > .option("host", "10.18.98.34") > .option("port", port) > .load() > qry = readSocketDF.writeStream > .format("carbondata") > .trigger(ProcessingTime("5 seconds")) > .option("checkpointLocation", tablePath.getStreamingCheckpointDir) > .option("tablePath", tablePath.getPath).option("tableName", > tableName) > .start() > qry.awaitTermination() > } catch { > case ex: Throwable => > ex.printStackTrace() > println("Done reading and writing streaming data") > } finally { > qry.stop() > } > } > } > thread.start() > thread > } > val streamTableName = "stream_table" > sql(s"CREATE TABLE $streamTableName (id INT,name STRING,city STRING,salary > FLOAT) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', > 'sort_columns'='name')") > sql(s"LOAD DATA LOCAL INPATH 'hdfs://hacluster/tmp/streamSample.csv' INTO > TABLE $streamTableName OPTIONS('HEADER'='true')") > sql(s"select * from $streamTableName").show > val carbonTable =
[jira] [Assigned] (CARBONDATA-1719) Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table created in parallel with table load, aggregate query returns no data
[ https://issues.apache.org/jira/browse/CARBONDATA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1719: - Assignee: Jatin > Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table > created in parallel with table load, aggregate query returns no data > -- > > Key: CARBONDATA-1719 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1719 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Jatin > Labels: DFX > Fix For: 1.3.0 > > > 1. Create a table > create table if not exists lineitem3(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Run load queries and create pre-agg table queries in diff console: > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > create datamap agr_lineitem3 ON TABLE lineitem3 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > 0: jdbc:hive2://10.18.98.34:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.258 seconds) > HDFS data: > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3_agr_lineitem3/Fact/Part0/Segment_0 > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3/Fact/Part0/Segment_0 > Found 27 items > -rw-r--r-- 2 root users 22148 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/1510740293106.carbonindexmerge > -rw-r--r-- 2 root users 58353052 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58351680 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58364823 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58356303 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58342246 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58353186 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58352964 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58357183 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58345739 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-2-0_batchno0-0-1510740300247.carbondata > Yarn job stages: > 29 > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 >
[jira] [Assigned] (CARBONDATA-1674) Carbon 1.3.0-Partitioning:Describe Formatted Should show the type of partition as well.
[ https://issues.apache.org/jira/browse/CARBONDATA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1674: - Assignee: Jatin > Carbon 1.3.0-Partitioning:Describe Formatted Should show the type of > partition as well. > --- > > Key: CARBONDATA-1674 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1674 > Project: CarbonData > Issue Type: Improvement > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Assignee: Jatin >Priority: Minor > Attachments: Jira_req_part1.PNG, jira_req_part2.PNG > > Time Spent: 20m > Remaining Estimate: 0h > > Describe Formatted should show type of partitions as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1790) (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming
[ https://issues.apache.org/jira/browse/CARBONDATA-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297974#comment-16297974 ] Jatin commented on CARBONDATA-1790: --- [~Ram@huawei] Please provide steps for replication. > (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is > performed in between the streaming > --- > > Key: CARBONDATA-1790 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1790 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Ramakrishna S > Labels: DFX > > Steps : > 1. Create a streaming table and do a batch load > 2. Set up the Streaming , so that it does streaming in chunk of 1000 records > 20 times > 3. Do another batch load on the table > 4. Do one more time streaming > +-++--+--+--++--+ > | Segment Id | Status | Load Start Time | Load End Time >| File Format | Merged To | > +-++--+--+--++--+ > | 2 | Success| 2017-11-21 21:42:36.77 | 2017-11-21 > 21:42:40.396 | COLUMNAR_V3 | NA | > | 1 | Streaming | 2017-11-21 21:40:46.2| NULL >| ROW_V1 | NA | > | 0 | Success| 2017-11-21 21:40:39.782 | 2017-11-21 > 21:40:43.168 | COLUMNAR_V3 | NA | > +-++--+--+--++--+ > *+Expected:+* Data should be loaded > *+Actual+* : Data load fiails > 1. One addition offset file is created(marked in bold) > -rw-r--r-- 2 root users 62 2017-11-21 21:40 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/0 > -rw-r--r-- 2 root users 63 2017-11-21 21:40 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/1 > -rw-r--r-- 2 root users 63 2017-11-21 21:42 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/10 > -rw-r--r-- 2 root users 63 2017-11-21 21:40 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/2 > -rw-r--r-- 2 root users 63 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/3 > -rw-r--r-- 2 root users 64 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/4 > -rw-r--r-- 2 root users 64 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/5 > -rw-r--r-- 2 root users 64 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/6 > -rw-r--r-- 2 root users 64 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/7 > -rw-r--r-- 2 root users 64 2017-11-21 21:41 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/8 > *-rw-r--r-- 2 root users 63 2017-11-21 21:42 > /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/9* > 2. Following error thrown: > === Streaming Query === > Identifier: [id = 3a5334bc-d471-4676-b6ce-f21105d491d1, runId = > b2be9f97-8141-46be-89db-9a0f98d13369] > Current Offsets: > {org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193: 1000} > Current State: ACTIVE > Thread State: RUNNABLE > Logical Plan: > org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193 > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:284) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:177) > Caused by: java.lang.RuntimeException: Offsets committed out of order: 20019 > followed by 1000 > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.streaming.TextSocketSource.commit(socket.scala:151) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:421) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:420) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) >
[jira] [Assigned] (CARBONDATA-1678) Carbon 1.3.0-Partitioning:Show partition throws index out of bounds exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1678: - Assignee: Jatin > Carbon 1.3.0-Partitioning:Show partition throws index out of bounds exception > - > > Key: CARBONDATA-1678 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1678 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Assignee: Jatin > Attachments: Show_part.PNG, Show_part.txt > > > create table part_nation_3 (N_NATIONKEY BIGINT,N_REGIONKEY BIGINT,N_COMMENT > STRING) partitioned by (N_NAME STRING) stored by 'carbondata' > tblproperties('partition_type'='list','list_info'='ALGERIA,ARGENTINA,BRAZIL,CANADA,(EGYPT,ETHIOPIA,FRANCE),JAPAN'); > ALTER TABLE part_nation_3 ADD PARTITION('SAUDI ARABIA,(VIETNAM,RUSSIA,UNITED > KINGDOM,UNITED STATES)'); > show partitions part_nation_3; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1806) Carbon1.3.0 Load with global sort: Load fails If a table is created with sort scope as global sort
[ https://issues.apache.org/jira/browse/CARBONDATA-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290666#comment-16290666 ] Jatin commented on CARBONDATA-1806: --- Please provide more details for this bug as I am not able to replicate this issue, neither on my local system nor on 3 node cluster. > Carbon1.3.0 Load with global sort: Load fails If a table is created with sort > scope as global sort > -- > > Key: CARBONDATA-1806 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1806 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.3.0 > Environment: 3 node cluster >Reporter: Ajeet Rai > Labels: dfx > > Carbon1.3.0 Load with global sort: Load fails If a table is created with sort > scope as global sort. > Steps: > 1: create table dt1 (c1 string, c2 int) STORED BY > 'org.apache.carbondata.format' tblproperties('sort_scope'='Global_sort'); > 2: LOAD DATA INPATH 'hdfs://hacluster/user/test/dt1.txt' INTO TABLE dt1 > OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '\"'); > 3: Observe that load fails with below error: > Error: java.lang.Exception: DataLoad failure (state=,code=0) > 4: Check log: > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > There is an unexpected error: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > at > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.writeFunc(DataLoadProcessorStepOnSpark.scala:198) > at > org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark$$anonfun$loadDataUsingGlobalSort$1.apply(DataLoadProcessBuilderOnSpark.scala:130) > at > org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark$$anonfun$loadDataUsingGlobalSort$1.apply(DataLoadProcessBuilderOnSpark.scala:129) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: org.apache.spark.util.TaskCompletionListenerException: > There is an unexpected error: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > Previous exception in task: There is an unexpected error: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.writeFunc(DataLoadProcessorStepOnSpark.scala:198) > > org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark$$anonfun$loadDataUsingGlobalSort$1.apply(DataLoadProcessBuilderOnSpark.scala:130) > > org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark$$anonfun$loadDataUsingGlobalSort$1.apply(DataLoadProcessBuilderOnSpark.scala:129) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at > org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:106) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > ... 4 more > Caused by: > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > at > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.processingComplete(DataWriterProcessorStepImpl.java:163) > at > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.finish(DataWriterProcessorStepImpl.java:149) > at > org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.writeFunc(DataLoadProcessorStepOnSpark.scala:189) > ... 8 more > Caused by: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > at > org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:326) > at >
[jira] [Commented] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290658#comment-16290658 ] Jatin commented on CARBONDATA-1782: --- regexexp_extract take the second parameter as the regex_pattern and third parameter is the index as per the regex pattern. so, in regexp_extract(CUST_NAME,'a',1) the pattern will have no output at index 1 and throws indexoutofbound. I have tried like : select regexp_extract(CUST_NAME,'CUST(.\*)',1)from uniqdata where regexp_extract(CUST_NAME,'CUST(.*)',1) IS NULL or regexp_extract(DOB,'b',2) is NULL; The output of regex grouped like (CUST)(_NAME_..) so at index 1 it selects (_NAME_..) > (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause > having is null throws indexoutofbounds exception > - > > Key: CARBONDATA-1782 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1782 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: DFX > > Steps : > Thrift server is started using the command - bin/spark-submit --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > "hdfs://hacluster/user/sparkhive/warehouse" > Spark shell is launched using the command - bin/spark-shell --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --jars > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > From Spark shell the streaming table is created and data is loaded to the > streaming table. > import java.io.{File, PrintWriter} > import java.net.ServerSocket > import org.apache.spark.sql.{CarbonEnv, SparkSession} > import org.apache.spark.sql.hive.CarbonRelation > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, > "/MM/dd") > import org.apache.spark.sql.CarbonSession._ > val carbonSession = SparkSession. > builder(). > appName("StreamExample"). > > getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") > > carbonSession.sparkContext.setLogLevel("INFO") > def sql(sql: String) = carbonSession.sql(sql) > def writeSocket(serverSocket: ServerSocket): Thread = { > val thread = new Thread() { > override def run(): Unit = { > // wait for client to connection request and accept > val clientSocket = serverSocket.accept() > val socketWriter = new PrintWriter(clientSocket.getOutputStream()) > var index = 0 > for (_ <- 1 to 1000) { > // write 5 records per iteration > for (_ <- 0 to 100) { > index = index + 1 > socketWriter.println(index.toString + ",name_" + index >+ ",city_" + index + "," + (index * > 1.00).toString + >",school_" + index + ":school_" + index + > index + "$" + index) > } > socketWriter.flush() > Thread.sleep(2000) > } > socketWriter.close() > System.out.println("Socket closed") > } > } > thread.start() > thread > } > > def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, > tableName: String, port: Int): Thread = { > val thread = new Thread() { > override def run(): Unit = { > var qry: StreamingQuery = null > try { > val readSocketDF = spark.readStream > .format("socket") > .option("host", "10.18.98.34") > .option("port", port) > .load() > qry = readSocketDF.writeStream > .format("carbondata") > .trigger(ProcessingTime("5 seconds")) > .option("checkpointLocation", tablePath.getStreamingCheckpointDir) > .option("tablePath", tablePath.getPath).option("tableName", > tableName) > .start() > qry.awaitTermination() > } catch { > case ex: Throwable => > ex.printStackTrace() > println("Done reading and writing streaming data") > } finally { > qry.stop() > } > } > } > thread.start() > thread > } >
[jira] [Updated] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin updated CARBONDATA-1782: -- Description: Steps : Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell the streaming table is created and data is loaded to the >streaming table. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "uniqdata" sql(s"CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS( 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 8006 val serverSocket = new ServerSocket(port) val socketThread = writeSocket(serverSocket) val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, port) >From Beeline user executes the query select regexp_extract(CUST_NAME,'a',1)from uniqdata where
[jira] [Assigned] (CARBONDATA-1680) Carbon 1.3.0-Partitioning:Show Partition for Hash Partition doesn't display the partition id
[ https://issues.apache.org/jira/browse/CARBONDATA-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1680: - Assignee: Jatin > Carbon 1.3.0-Partitioning:Show Partition for Hash Partition doesn't display > the partition id > > > Key: CARBONDATA-1680 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1680 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Assignee: Jatin >Priority: Minor > Attachments: Show_part_1_doc.PNG, show_part_1.PNG > > > CREATE TABLE IF NOT EXISTS t9( > id Int, > logdate Timestamp, > phonenumber Int, > country String, > area String > ) > PARTITIONED BY (vin String) > STORED BY 'carbondata' > TBLPROPERTIES('PARTITION_TYPE'='HASH','NUM_PARTITIONS'='5'); > show partitions t9; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1827: - Assignee: Jatin > Add Support to provide S3 Functionality in Carbondata > - > > Key: CARBONDATA-1827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1827 > Project: CarbonData > Issue Type: New Feature > Components: core >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Added Support to provide S3 Functionality in Carbondata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1788) Insert is not working as expected when loaded with more than 32000 column length.
[ https://issues.apache.org/jira/browse/CARBONDATA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283443#comment-16283443 ] Jatin commented on CARBONDATA-1788: --- Please provide the csv for table_name table. so that i'll be replicate the issue. > Insert is not working as expected when loaded with more than 32000 column > length. > - > > Key: CARBONDATA-1788 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1788 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: pakanati revathi >Priority: Minor > Attachments: Insert.PNG > > > Insert should accept only 32000 length column. But when trying to load more > than 32000 column length data, insert is successful. > Expected result: When inserted more than 32000 column length, the insert > should throw error. > Actual result: When inserted more than 32000 column length, the insert is > successful. > Note: Update also should throw error while updating more than 3200 column > length.Please implement for update also. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1714) Carbon1.3.0-Alter Table - Select columns with is null and limit throws ArrayIndexOutOfBoundsException after multiple alter
[ https://issues.apache.org/jira/browse/CARBONDATA-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1714: - Assignee: Jatin > Carbon1.3.0-Alter Table - Select columns with is null and limit throws > ArrayIndexOutOfBoundsException after multiple alter > -- > > Key: CARBONDATA-1714 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1714 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster- SUSE 11 SP4 >Reporter: Chetan Bhat >Assignee: Jatin > Labels: DFX > > Steps - > Execute the below queries in sequence. > create database test; > use test; > CREATE TABLE uniqdata111785 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='INTEGER_COLUMN1,CUST_ID'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata111785 OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > alter table test.uniqdata111785 RENAME TO uniqdata1117856; > select * from test.uniqdata1117856 limit 100; > ALTER TABLE test.uniqdata1117856 ADD COLUMNS (cust_name1 int); > select * from test.uniqdata1117856 where cust_name1 is null limit 100; > ALTER TABLE test.uniqdata1117856 DROP COLUMNS (cust_name1); > select * from test.uniqdata1117856 where cust_name1 is null limit 100; > ALTER TABLE test.uniqdata1117856 CHANGE CUST_ID CUST_ID BIGINT; > select * from test.uniqdata1117856 where CUST_ID in (10013,10011,1,10019) > limit 10; > ALTER TABLE test.uniqdata1117856 ADD COLUMNS (a1 INT, b1 STRING) > TBLPROPERTIES('DICTIONARY_EXCLUDE'='b1'); > select a1,b1 from test.uniqdata1117856 where a1 is null and b1 is null limit > 100; > Actual Issue : Select columns with is null and limit throws > ArrayIndexOutOfBoundsException after multiple alter operations. > 0: jdbc:hive2://10.18.98.34:23040> select a1,b1 from test.uniqdata1117856 > where a1 is null and b1 is null limit 100; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 9.0 (TID 14, BLR114269, executor 2): > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.carbondata.core.scan.model.QueryModel.setDimAndMsrColumnNode(QueryModel.java:223) > at > org.apache.carbondata.core.scan.model.QueryModel.processFilterExpression(QueryModel.java:172) > at > org.apache.carbondata.core.scan.model.QueryModel.processFilterExpression(QueryModel.java:181) > at > org.apache.carbondata.hadoop.util.CarbonInputFormatUtil.processFilterExpression(CarbonInputFormatUtil.java:118) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:791) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:250) > at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) > Expected : The select query should be successful after multiple alter > operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1195) Rectification in configuration-parameters.md
[ https://issues.apache.org/jira/browse/CARBONDATA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin closed CARBONDATA-1195. - Resolution: Won't Fix Fix Version/s: NONE > Rectification in configuration-parameters.md > > > Key: CARBONDATA-1195 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1195 > Project: CarbonData > Issue Type: Bug > Components: docs >Reporter: Jatin >Assignee: Jatin >Priority: Minor > Fix For: NONE > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1476) Add Unit TestCases For Presto Integration
Jatin created CARBONDATA-1476: - Summary: Add Unit TestCases For Presto Integration Key: CARBONDATA-1476 URL: https://issues.apache.org/jira/browse/CARBONDATA-1476 Project: CarbonData Issue Type: Task Reporter: Jatin Assignee: Jatin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1442) Reformat Partition-Guide.md File
Jatin created CARBONDATA-1442: - Summary: Reformat Partition-Guide.md File Key: CARBONDATA-1442 URL: https://issues.apache.org/jira/browse/CARBONDATA-1442 Project: CarbonData Issue Type: Improvement Reporter: Jatin Assignee: Jatin Priority: Minor Change Some markdown tags to maintain consistency. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1266) Select from NonExisting table returns null
[ https://issues.apache.org/jira/browse/CARBONDATA-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1266: - Assignee: Jatin > Select from NonExisting table returns null > --- > > Key: CARBONDATA-1266 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1266 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.2.0 > Environment: spark 2.1, presto 0.170 >Reporter: Jatin >Assignee: Jatin >Priority: Minor > > Selecting data from non existing table in presto shows error metadata is null > instead it should show table doesn't exists. > select * from abc; > Query 20170705_114255_0_72hqk failed: metadata is null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1266) Select from NonExisting table returns null
[ https://issues.apache.org/jira/browse/CARBONDATA-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1266: - Assignee: (was: Jatin) > Select from NonExisting table returns null > --- > > Key: CARBONDATA-1266 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1266 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.2.0 > Environment: spark 2.1, presto 0.170 >Reporter: Jatin >Priority: Minor > > Selecting data from non existing table in presto shows error metadata is null > instead it should show table doesn't exists. > select * from abc; > Query 20170705_114255_0_72hqk failed: metadata is null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1266) Select from NonExisting table returns null
Jatin created CARBONDATA-1266: - Summary: Select from NonExisting table returns null Key: CARBONDATA-1266 URL: https://issues.apache.org/jira/browse/CARBONDATA-1266 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 1.2.0 Environment: spark 2.1, presto 0.170 Reporter: Jatin Assignee: Jatin Priority: Minor Selecting data from non existing table in presto shows error metadata is null instead it should show table doesn't exists. select * from abc; Query 20170705_114255_0_72hqk failed: metadata is null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1141) Data load is partially successful but delete error
[ https://issues.apache.org/jira/browse/CARBONDATA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073607#comment-16073607 ] Jatin commented on CARBONDATA-1141: --- I have tried the same scenario with latest code but I didn't able to reproduce the scenario. Please provide more details. > Data load is partially successful but delete error > --- > > Key: CARBONDATA-1141 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1141 > Project: CarbonData > Issue Type: Bug > Components: spark-integration, sql >Affects Versions: 1.2.0 > Environment: spark on > yarn,carbondata1.2.0,hadoop2.7,spark2.1.0,hive2.1.0 >Reporter: zhuzhibin > Fix For: 1.2.0 > > Attachments: error1.png, error.png > > > when I tried to load data into table (data size is about 300 million),the log > showed me that “Data load is partially successful for table", > but when I executed delete table operation,some errors appeared,the error > message is "java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.core.mutate.CarbonUpdateUtil.getRequiredFieldFromTID(CarbonUpdateUtil.java:67)". > when I executed another delete table operation with where condition,it was > succeeful,but executed select operation then appeared > "java.lang.ArrayIndexOutOfBoundsException Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)" > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-980) Result does not displays while using not null operator in presto integration.
[ https://issues.apache.org/jira/browse/CARBONDATA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-980: Assignee: Jatin > Result does not displays while using not null operator in presto integration. > - > > Key: CARBONDATA-980 > URL: https://issues.apache.org/jira/browse/CARBONDATA-980 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: spark 2.1, presto 0.166 >Reporter: Vandana Yadav >Assignee: Jatin >Priority: Minor > Attachments: 2000_UniqData.csv > > Time Spent: 0.5h > Remaining Estimate: 0h > > Result does not displays while using not null operator in presto integration. > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2. In presto > a) Execute the query: > select CUST_ID from uniqdata where CUST_ID IS NOT NULL order by CUST_ID > Expected result:it should display all not null values from the table. > Actual Result: > In CarbonData: > "| 10994| > | 10995| > | 10996| > | 10997| > | 10998| > +--+--+ > | CUST_ID | > +--+--+ > | 10999| > +--+--+ > 2,001 rows selected (0.701 seconds) > " > In presto: > "Query 20170420_073851_00038_hd7jy failed: null > " -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1195) Rectification in configuration-parameters.md
[ https://issues.apache.org/jira/browse/CARBONDATA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jatin reassigned CARBONDATA-1195: - Assignee: Jatin > Rectification in configuration-parameters.md > > > Key: CARBONDATA-1195 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1195 > Project: CarbonData > Issue Type: Bug > Components: docs >Reporter: Jatin >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1187) Fix Documentation links pointing to wrong urls in useful-tips-on-carbondata and faq
Jatin created CARBONDATA-1187: - Summary: Fix Documentation links pointing to wrong urls in useful-tips-on-carbondata and faq Key: CARBONDATA-1187 URL: https://issues.apache.org/jira/browse/CARBONDATA-1187 Project: CarbonData Issue Type: Bug Components: docs Reporter: Jatin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-997) Correct result does not display in presto integration as compare to CarbonData
[ https://issues.apache.org/jira/browse/CARBONDATA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044370#comment-16044370 ] Jatin edited comment on CARBONDATA-997 at 6/9/17 12:42 PM: --- This bug is Invalid as presto doesn't support order by with column name having a selection of same column twice or more. i.e why it throws column is ambiguous. The same issue occurs when selecting it from hive. Instead, we can create an alias for that column as select BIGINT_COLUMN1,BIGINT_COLUMN1 as newBigInt from UNIQDATA where DECIMAL_COLUMN1<=BIGINT_COLUMN1 order by BIGINT_COLUMN1; was (Author: jatin demla): This bug is Invalid as presto doesnot support order by with column name having selection of same column twice or more. i.e why it throws column is ambiguous. Instead we can create an alias for that column as select BIGINT_COLUMN1,BIGINT_COLUMN1 as newbig from UNIQDATA where DECIMAL_COLUMN1<=BIGINT_COLUMN1 order by BIGINT_COLUMN1; > Correct result does not display in presto integration as compare to CarbonData > -- > > Key: CARBONDATA-997 > URL: https://issues.apache.org/jira/browse/CARBONDATA-997 > Project: CarbonData > Issue Type: Bug > Components: data-query, presto-integration >Affects Versions: 1.1.0 > Environment: spark 2.1, presto 0.166 >Reporter: Vandana Yadav >Priority: Minor > Attachments: 2000_UniqData.csv > > > Correct result does not display in presto integration as compare to CarbonData > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2. In presto > a) Execute the query: > select BIGINT_COLUMN1, BIGINT_COLUMN1 from UNIQDATA where > DECIMAL_COLUMN1<=BIGINT_COLUMN1 order by BIGINT_COLUMN1 > Actual result : > In CarbonData: > "| 123372038849| 123372038849| > | 123372038850| 123372038850| > | 123372038851| 123372038851| > | 123372038852| 123372038852| > | 123372038853| 123372038853| > +-+-+--+ > 2,000 rows selected (1.087 seconds) > " > In presto: > "Query 20170420_091614_00065_hd7jy failed: line 1:100: Column > 'bigint_column1' is ambiguous > select BIGINT_COLUMN1, BIGINT_COLUMN1 from UNIQDATA where > DECIMAL_COLUMN1<=BIGINT_COLUMN1 order by BIGINT_COLUMN1" > Expected result: it should display the same result as showing in CarbonData. -- This message was sent by Atlassian JIRA (v6.3.15#6346)