[jira] [Created] (CARBONDATA-4083) Refactor Update and Support Update Atomicity
Xingjun Hao created CARBONDATA-4083: --- Summary: Refactor Update and Support Update Atomicity Key: CARBONDATA-4083 URL: https://issues.apache.org/jira/browse/CARBONDATA-4083 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao Currently, we will modify tablestatus file for serveral times in the update flow. In total 4 tablestauts write ops destoy the Atomicity to a certain extent. which maybe incur dirty data under update failure scenrios. The first time we update tablestatus is when writing delta files, firstly we update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then delete some segments, which bring 2 tablestatus write ops. The second time we update tatblstatus is when insert new data. just like the first time, will bring 2 tablesatus write ops. Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR Compaction even when we TURN ON carbon.merge.auto.compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4063) Refactor getBlockId and getShortBlockId function
Xingjun Hao created CARBONDATA-4063: --- Summary: Refactor getBlockId and getShortBlockId function Key: CARBONDATA-4063 URL: https://issues.apache.org/jira/browse/CARBONDATA-4063 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao Now. getBlockId and getShortBlockId functions are too complex and unreadable. Which need to be simpler and readable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4045) Add TPCDS TestCase
Xingjun Hao created CARBONDATA-4045: --- Summary: Add TPCDS TestCase Key: CARBONDATA-4045 URL: https://issues.apache.org/jira/browse/CARBONDATA-4045 Project: CarbonData Issue Type: Test Reporter: Xingjun Hao There is no TPC-DS TestCase in the current source code. It is difficult to debug TPC-DS on small dataset. Also, TPC-DS TestCase would help to find possible issues -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4044) Fix dirty data in indexfile while IUD with stale data in segment folder
Xingjun Hao created CARBONDATA-4044: --- Summary: Fix dirty data in indexfile while IUD with stale data in segment folder Key: CARBONDATA-4044 URL: https://issues.apache.org/jira/browse/CARBONDATA-4044 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao XX.mergecarbonindex and XX..segment records the indexfiles list of a segment. now, we generate xx.mergeindexfile and xx.segment based on filter out all indexfiles(including carbonindex and mergecarbonindex), which will leading dirty data when there is stale data in segment folder. For example, there are a stale index file in segment_0 folder, "0_1603763776.carbonindex". While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when merge carbonindex files, we expect to only merge 0_16037752342.carbonindex, But If we filter out all carbonindex in segment folder, both "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and recorded into segment file. While updating, there has same problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4032) Drop partition command clean other partition dictionaries
Xingjun Hao created CARBONDATA-4032: --- Summary: Drop partition command clean other partition dictionaries Key: CARBONDATA-4032 URL: https://issues.apache.org/jira/browse/CARBONDATA-4032 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 2.0.1 Reporter: Xingjun Hao Fix For: 2.1.0 1. CREATE TABLE droppartition (id STRING, sales STRING) PARTITIONED BY (dtm STRING)STORED AS carbondata 2. insert into droppartition values ('01', '0', '20200907'),('03', '0', '20200908'), 3. insert overwrite table droppartition partition (dtm=20200908) select * from droppartition where dtm = 20200907; insert overwrite table droppartition partition (dtm=20200909) select * from droppartition where dtm = 20200907; 4. alter table droppartition drop partition (dtm=20200909) the dirctionary "20200908" was deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4028) Fail to unlock during update
Xingjun Hao created CARBONDATA-4028: --- Summary: Fail to unlock during update Key: CARBONDATA-4028 URL: https://issues.apache.org/jira/browse/CARBONDATA-4028 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao In the update flow, we unpresist {{dataset before unlocking. unlock will fail once the dataset unpresist is interrupted.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4027) Fix the wrong modifiedtime of loading files in insert stage
Xingjun Hao created CARBONDATA-4027: --- Summary: Fix the wrong modifiedtime of loading files in insert stage Key: CARBONDATA-4027 URL: https://issues.apache.org/jira/browse/CARBONDATA-4027 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao In the insertstage flow, there is a empty file with suffix '.loading' to mark the stage in the status of 'in processing'. We update the modifiedtime of '.loading' file for monitoring the insertstage start time, which can be used for calculate TIMEOUT, help to retry and recovery. Before, we use setModifiedTime function to update the modifiedtime, which has a serious bug. For S3 file, setModifiedTime operation do not take effect. leading to the incorrect inserstage starttime of 'loading' file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4026) Thread leakage while Loading
Xingjun Hao created CARBONDATA-4026: --- Summary: Thread leakage while Loading Key: CARBONDATA-4026 URL: https://issues.apache.org/jira/browse/CARBONDATA-4026 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.1 Reporter: Xingjun Hao Fix For: 2.1.0 A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown executorservice. leads to thread leakage which will degrade the performance of the driver and executor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4016) NPE and FileNotFound in Show Segments and Insert Stage
Xingjun Hao created CARBONDATA-4016: --- Summary: NPE and FileNotFound in Show Segments and Insert Stage Key: CARBONDATA-4016 URL: https://issues.apache.org/jira/browse/CARBONDATA-4016 Project: CarbonData Issue Type: Bug Components: flink-integration, spark-integration Affects Versions: 2.0.1 Reporter: Xingjun Hao Fix For: 2.1.0 # Insert Stage, While Spark read Stages which are writting by Flink in the meanwhile, JSONFORMAT EXCEPTION will be thrown. # Show Segments with STAGE, when read stages which are writting by Flink or deleting by spark. JSONFORMAT EXCEPTION will be thrown # Show Segment will load partition info for non-partition table, which shall be avoided. # In getLastModifiedTime of TableStatus, if the loadendtime is empty, getLastModifiedTime throw NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4014) Support Change Column Comment
Xingjun Hao created CARBONDATA-4014: --- Summary: Support Change Column Comment Key: CARBONDATA-4014 URL: https://issues.apache.org/jira/browse/CARBONDATA-4014 Project: CarbonData Issue Type: New Feature Components: sql Affects Versions: 2.0.1 Reporter: Xingjun Hao Fix For: 2.1.0 Now, we support add comment when CREATE TABLE and ADD COLUMN. but do not support alter comment of specified column. We shall support alter comment with hive syntax "ALTER TABLE table_name CHANGE [COLUMN] col_name col_name data_type [COMMENT col_comment]" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3945) NPE while Data Loading
Xingjun Hao created CARBONDATA-3945: --- Summary: NPE while Data Loading Key: CARBONDATA-3945 URL: https://issues.apache.org/jira/browse/CARBONDATA-3945 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao # getLastModifiedTime of LoadMetadataDetails fails due to "updateDeltaEndTimestamp is empty string". # In the getCommittedIndexFile founction, NPE happens because of "segmentfile is null" under the Unusual cases. # Cleaning temp files fails because of "partitionInfo is null" under the unusual cases. # When calculating sizeInBytes of CarbonRelation, under the unusual cases, it need to collect the directory size. but the directory path only works for non-partition tables, for partition tables, filenotfoundexcepiton was throwed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3944) Delete stage files was interrupted when IOException happen
Xingjun Hao created CARBONDATA-3944: --- Summary: Delete stage files was interrupted when IOException happen Key: CARBONDATA-3944 URL: https://issues.apache.org/jira/browse/CARBONDATA-3944 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao In the insertstage flow, the stage files will be deleted with retry mechanism. but then IOException happen due to network abnormal etc, the delete stage flow will be interrupted, which is unexpected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3940) Fail to commit the output of task due to Rename IOException in the Loading processing
[ https://issues.apache.org/jira/browse/CARBONDATA-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3940: Summary: Fail to commit the output of task due to Rename IOException in the Loading processing (was: Fail to commit the output of task due to Rename IOException in the Data Loading) > Fail to commit the output of task due to Rename IOException in the Loading > processing > - > > Key: CARBONDATA-3940 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3940 > Project: CarbonData > Issue Type: Bug >Reporter: Xingjun Hao >Priority: Major > > During the load process, commitTask fails with high probability. The > exceptionstack shows that it was throwed by HadoopMapReduceCommitProtocol, > not CarbonSQLHadoopMapMapReduceCommitProtocol, implying that there is has a > class type error in initialization of the "Committer". which should have been > initialized as CarbonSQLHadoopMapMapReduceCommitProtocol, but was incorrectly > initialized to HadoopMapReduceCommitProtocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3940) Fail to commit the output of task due to Rename IOException in the Data Loading
Xingjun Hao created CARBONDATA-3940: --- Summary: Fail to commit the output of task due to Rename IOException in the Data Loading Key: CARBONDATA-3940 URL: https://issues.apache.org/jira/browse/CARBONDATA-3940 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao During the load process, commitTask fails with high probability. The exceptionstack shows that it was throwed by HadoopMapReduceCommitProtocol, not CarbonSQLHadoopMapMapReduceCommitProtocol, implying that there is has a class type error in initialization of the "Committer". which should have been initialized as CarbonSQLHadoopMapMapReduceCommitProtocol, but was incorrectly initialized to HadoopMapReduceCommitProtocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3940) CommitTask fails due to Rename IOException in the Loading processing
[ https://issues.apache.org/jira/browse/CARBONDATA-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3940: Summary: CommitTask fails due to Rename IOException in the Loading processing (was: Fail to commit the output of task due to Rename IOException in the Loading processing) > CommitTask fails due to Rename IOException in the Loading processing > > > Key: CARBONDATA-3940 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3940 > Project: CarbonData > Issue Type: Bug >Reporter: Xingjun Hao >Priority: Major > > During the load process, commitTask fails with high probability. The > exceptionstack shows that it was throwed by HadoopMapReduceCommitProtocol, > not CarbonSQLHadoopMapMapReduceCommitProtocol, implying that there is has a > class type error in initialization of the "Committer". which should have been > initialized as CarbonSQLHadoopMapMapReduceCommitProtocol, but was incorrectly > initialized to HadoopMapReduceCommitProtocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3898) Support Option 'carbon.enable.querywithmv'
Xingjun Hao created CARBONDATA-3898: --- Summary: Support Option 'carbon.enable.querywithmv' Key: CARBONDATA-3898 URL: https://issues.apache.org/jira/browse/CARBONDATA-3898 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao When MV enabled, SQL rewrite takes a lot of time, a new option 'carbon.enable.querywithmv' shall be supported, which can turn off SQL Rewrite when the configured value is false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3879) Filtering Segmets Optimazation
Xingjun Hao created CARBONDATA-3879: --- Summary: Filtering Segmets Optimazation Key: CARBONDATA-3879 URL: https://issues.apache.org/jira/browse/CARBONDATA-3879 Project: CarbonData Issue Type: Improvement Components: data-query Affects Versions: 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.2 During filter segments flow, there are a lot of LIST.CONTAINS, which has heavy time overhead when there are tens of thousands segments. For example, if there are 5 segments. it will trigger LIST.CONTAINS for each segment, the LIST also has about 5 elements. so the time complexity will be O(5 * 5 ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3877) Reduce read tablestatus overhead during inserting into partition table
[ https://issues.apache.org/jira/browse/CARBONDATA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao closed CARBONDATA-3877. --- Resolution: Fixed > Reduce read tablestatus overhead during inserting into partition table > -- > > Key: CARBONDATA-3877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3877 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.0.2 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently during inserting into a partition table, there are a lot of > tablestauts read operations, but when storing table status file in object > store, reading of table status file may fail (receive IOException or > JsonSyntaxException) when table status file is being modifying, which leading > to High failure rate when concurrent insert into a partition table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3877) Reduce read tablestatus overhead during inserting into partition table
Xingjun Hao created CARBONDATA-3877: --- Summary: Reduce read tablestatus overhead during inserting into partition table Key: CARBONDATA-3877 URL: https://issues.apache.org/jira/browse/CARBONDATA-3877 Project: CarbonData Issue Type: Improvement Components: spark-integration Affects Versions: 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.2 Currently during inserting into a partition table, there are a lot of tablestauts read operations, but when storing table status file in object store, reading of table status file may fail (receive IOException or JsonSyntaxException) when table status file is being modifying, which leading to High failure rate when concurrent insert into a partition table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3875) Support show segments include stage
Xingjun Hao created CARBONDATA-3875: --- Summary: Support show segments include stage Key: CARBONDATA-3875 URL: https://issues.apache.org/jira/browse/CARBONDATA-3875 Project: CarbonData Issue Type: New Feature Components: spark-integration Affects Versions: 2.0.0, 2.0.1 Reporter: Xingjun Hao Fix For: 2.0.2 There is a lack of monitoring of the stage information in the current system, 'Show segments include stage' command shall be supported. which will provide monitoring information, such as createTime, partitioninfo, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3858) Check CDC deltafiles count in the testcase
[ https://issues.apache.org/jira/browse/CARBONDATA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3858: Description: Current there is no deltafiles count check in the testcase, which shall be supplemented. (was: In the CDC flow. the parallelism of deltafiles processing is the same as executor number, which reduce the parallelism heavily. The insufficient parallelism limits CPU overhead, hampers CDC's performance.) Summary: Check CDC deltafiles count in the testcase (was: Increase the parallelism of CDC intermediate files processing) > Check CDC deltafiles count in the testcase > -- > > Key: CARBONDATA-3858 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3858 > Project: CarbonData > Issue Type: Improvement >Reporter: Xingjun Hao >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Current there is no deltafiles count check in the testcase, which shall be > supplemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3862) Insert stage performance optimazation
Xingjun Hao created CARBONDATA-3862: --- Summary: Insert stage performance optimazation Key: CARBONDATA-3862 URL: https://issues.apache.org/jira/browse/CARBONDATA-3862 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao There are two major performance bottlenecks of insert stage. 1) Get LastModifyTime of stagefiles requires a lot of access to OBS 2) Parallelism is not supported Which shall be optimazed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3860) Fix IndexServer keeps loading some segments index repeatly
Xingjun Hao created CARBONDATA-3860: --- Summary: Fix IndexServer keeps loading some segments index repeatly Key: CARBONDATA-3860 URL: https://issues.apache.org/jira/browse/CARBONDATA-3860 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao In current getTableBlockIndexUniqueIdentifiers function. if the segmentBlockIndexInfo.getSegmentMetaDataInfo() is null, the IndexServer will keeps loading the index of this segment repeatly. We shall avoid to let it affect query performance, considering MetaDataInfo doesn't matter with the query processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3859) Lock and retry to read tablestatus before throwing EOFException or JsonSyntaxException
[ https://issues.apache.org/jira/browse/CARBONDATA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3859: Summary: Lock and retry to read tablestatus before throwing EOFException or JsonSyntaxException (was: Enhance lock and retry of Reading tablestatus files while loading) > Lock and retry to read tablestatus before throwing EOFException or > JsonSyntaxException > -- > > Key: CARBONDATA-3859 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3859 > Project: CarbonData > Issue Type: Improvement >Reporter: Xingjun Hao >Priority: Major > > when storing table status file in object store, reading of table status file > mayfail (receive EOFException or JsonSyntaxException) > when table status file is being modifying > we shall retry multiple times and add the lock before throwing EOFException > or JsonSyntaxException -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3859) Enhance lock and retry of Reading tablestatus files while loading
Xingjun Hao created CARBONDATA-3859: --- Summary: Enhance lock and retry of Reading tablestatus files while loading Key: CARBONDATA-3859 URL: https://issues.apache.org/jira/browse/CARBONDATA-3859 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao when storing table status file in object store, reading of table status file mayfail (receive EOFException or JsonSyntaxException) when table status file is being modifying we shall retry multiple times and add the lock before throwing EOFException or JsonSyntaxException -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3858) Increase the parallelism of CDC intermediate files processing
[ https://issues.apache.org/jira/browse/CARBONDATA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3858: Summary: Increase the parallelism of CDC intermediate files processing (was: Increase the parallelism of CDC deltafiles processing) > Increase the parallelism of CDC intermediate files processing > - > > Key: CARBONDATA-3858 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3858 > Project: CarbonData > Issue Type: Improvement >Reporter: Xingjun Hao >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > In the CDC flow. the parallelism of deltafiles processing is the same as > executor number, which reduce the parallelism heavily. The insufficient > parallelism limits CPU overhead, hampers CDC's performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3858) Increase the parallelism of CDC deltafiles processing
Xingjun Hao created CARBONDATA-3858: --- Summary: Increase the parallelism of CDC deltafiles processing Key: CARBONDATA-3858 URL: https://issues.apache.org/jira/browse/CARBONDATA-3858 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao In the CDC flow. the parallelism of deltafiles processing is the same as executor number, which reduce the parallelism heavily. The insufficient parallelism limits CPU overhead, hampers CDC's performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3856) Support the LIMIT operator for show segments command
Xingjun Hao created CARBONDATA-3856: --- Summary: Support the LIMIT operator for show segments command Key: CARBONDATA-3856 URL: https://issues.apache.org/jira/browse/CARBONDATA-3856 Project: CarbonData Issue Type: New Feature Components: spark-integration Affects Versions: 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.2 Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the SHOW SEGMENTS command. The time cost is expensive when there are too many segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3820) Fix CDC failure when sort columns present in source dataframe
[ https://issues.apache.org/jira/browse/CARBONDATA-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3820: Summary: Fix CDC failure when sort columns present in source dataframe (was: Support GlobalSort in the CDC) > Fix CDC failure when sort columns present in source dataframe > - > > Key: CARBONDATA-3820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3820 > Project: CarbonData > Issue Type: New Feature >Reporter: Xingjun Hao >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > If there is GloabalSort table in the CDC Flow. The following exception will > be throwed: > Exception in thread "main" java.lang.RuntimeException: column: id specified > in sort columns does not exist in schema > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildTableSchema(CarbonWriterBuilder.java:828) > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildCarbonTable(CarbonWriterBuilder.java:794) > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildLoadModel(CarbonWriterBuilder.java:720) > at > org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil$.prepareLoadModel(CarbonSparkDataSourceUtil.scala:281) > at > org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat.prepareWrite(SparkCarbonFileFormat.scala:141) > at > org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processIUD(CarbonMergeDataSetCommand.scala:269) > at > org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processData(CarbonMergeDataSetCommand.scala:152) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3820) Support GlobalSort in the CDC
Xingjun Hao created CARBONDATA-3820: --- Summary: Support GlobalSort in the CDC Key: CARBONDATA-3820 URL: https://issues.apache.org/jira/browse/CARBONDATA-3820 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao If there is GloabalSort table in the CDC Flow. The following exception will be throwed: Exception in thread "main" java.lang.RuntimeException: column: id specified in sort columns does not exist in schema at org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildTableSchema(CarbonWriterBuilder.java:828) at org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildCarbonTable(CarbonWriterBuilder.java:794) at org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildLoadModel(CarbonWriterBuilder.java:720) at org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil$.prepareLoadModel(CarbonSparkDataSourceUtil.scala:281) at org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat.prepareWrite(SparkCarbonFileFormat.scala:141) at org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processIUD(CarbonMergeDataSetCommand.scala:269) at org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processData(CarbonMergeDataSetCommand.scala:152) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow
Xingjun Hao created CARBONDATA-3816: --- Summary: Support Float and Decimal in the Merge Flow Key: CARBONDATA-3816 URL: https://issues.apache.org/jira/browse/CARBONDATA-3816 Project: CarbonData Issue Type: New Feature Components: data-load Affects Versions: 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.1 We don't support FLOAT and DECIMAL datatype in the CDC Flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3722) Create filterExecuter for each segment instead of blocklet, To improve prune performance
Xingjun Hao created CARBONDATA-3722: --- Summary: Create filterExecuter for each segment instead of blocklet, To improve prune performance Key: CARBONDATA-3722 URL: https://issues.apache.org/jira/browse/CARBONDATA-3722 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao In the prunning, It will create filterexecuter for each blocklet, which involves a huge performance degrade when there are serveral million blocklet. We shall create filterexecuter per segment instead of that per blocklet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3712) Support insert stage in parallel
Xingjun Hao created CARBONDATA-3712: --- Summary: Support insert stage in parallel Key: CARBONDATA-3712 URL: https://issues.apache.org/jira/browse/CARBONDATA-3712 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3703) Support insert stage in parallel
Xingjun Hao created CARBONDATA-3703: --- Summary: Support insert stage in parallel Key: CARBONDATA-3703 URL: https://issues.apache.org/jira/browse/CARBONDATA-3703 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3702) Clean temp index files in parallel in merge index flow
Xingjun Hao created CARBONDATA-3702: --- Summary: Clean temp index files in parallel in merge index flow Key: CARBONDATA-3702 URL: https://issues.apache.org/jira/browse/CARBONDATA-3702 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao Now, Cleaning temp index files merge index flow takes a lot of time, sometimes it will take 2~3 mins, which should be optimized -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3700) Optimize prune performance when prunning with multi-threads
Xingjun Hao created CARBONDATA-3700: --- Summary: Optimize prune performance when prunning with multi-threads Key: CARBONDATA-3700 URL: https://issues.apache.org/jira/browse/CARBONDATA-3700 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 2.0.0 Reporter: Xingjun Hao When pruning with multi-threads, there is a bug hambers the prunning performance heavily. When the datamap pruning results in no blocklet to map filter, The getExtendblocklet function aims to get the extend blocklet metadata, when the Input is a empty blocklet list, this function should return a extend blocklet list directyly , but now there is a bug leading to a hashset add operation overhead. Meanwhile ,When pruning with multi-threads, the getExtendblocklet function will be triggerd for each blocklet. This should avoided by trgger this function for each segment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3683) Support compress offheap data directly in the columnpage in IndexStorageCodec
Xingjun Hao created CARBONDATA-3683: --- Summary: Support compress offheap data directly in the columnpage in IndexStorageCodec Key: CARBONDATA-3683 URL: https://issues.apache.org/jira/browse/CARBONDATA-3683 Project: CarbonData Issue Type: Sub-task Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3682) Support compress offheap data directly in the columnpage if the dataype is primitive
Xingjun Hao created CARBONDATA-3682: --- Summary: Support compress offheap data directly in the columnpage if the dataype is primitive Key: CARBONDATA-3682 URL: https://issues.apache.org/jira/browse/CARBONDATA-3682 Project: CarbonData Issue Type: Sub-task Reporter: Xingjun Hao If the datatype is primitve, like BOOLEAN/BYTE/SHORT/SHORT_INT/INT/LONG/FLOAT/DOUBLE/Decimal, the columnpage should be compressed based on the direct bytebuffer on the offheap directly, To avoid a copy from offheap to heap to reduce the GC overhead -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3671) Support compress direct bytebuffer in the SNAPPY/ZSTD/GZIP compressor
Xingjun Hao created CARBONDATA-3671: --- Summary: Support compress direct bytebuffer in the SNAPPY/ZSTD/GZIP compressor Key: CARBONDATA-3671 URL: https://issues.apache.org/jira/browse/CARBONDATA-3671 Project: CarbonData Issue Type: Sub-task Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.
Xingjun Hao created CARBONDATA-3670: --- Summary: Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed. Key: CARBONDATA-3670 URL: https://issues.apache.org/jira/browse/CARBONDATA-3670 Project: CarbonData Issue Type: Wish Components: core Affects Versions: 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.0 When writing data, the columnpages are stored on the offheap, the pages will be compressed to save storage cost. Now, in the compression processing, the data will be copied from the offheap to the heap before compressed, which leads to heavier GC overhead compared with compress offhead directly. To sum up, we support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3669) Delete Physical Partition When Drop Partition
Xingjun Hao created CARBONDATA-3669: --- Summary: Delete Physical Partition When Drop Partition Key: CARBONDATA-3669 URL: https://issues.apache.org/jira/browse/CARBONDATA-3669 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao When drop partition, the data will not be clean, which is different with HIVE. The customers will confuse about that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3667) Insert stage recover processing of the partition table throw exception “the unexpected 0 segment found”
Xingjun Hao created CARBONDATA-3667: --- Summary: Insert stage recover processing of the partition table throw exception “the unexpected 0 segment found” Key: CARBONDATA-3667 URL: https://issues.apache.org/jira/browse/CARBONDATA-3667 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 2.0.0 Reporter: Xingjun Hao Insert stage recover processing of the partition table throw exception “the unexpected 0 segment found” -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3648) Support Alter Table Compaction Level Threshold
Xingjun Hao created CARBONDATA-3648: --- Summary: Support Alter Table Compaction Level Threshold Key: CARBONDATA-3648 URL: https://issues.apache.org/jira/browse/CARBONDATA-3648 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao The Alter Table sould support Compaction Level Threshold. Also, the upper limit is 100, which is too small to meet the scenario with massive small files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3644) Support Configuration of Complex Delimiters in Carbon Properties
Xingjun Hao created CARBONDATA-3644: --- Summary: Support Configuration of Complex Delimiters in Carbon Properties Key: CARBONDATA-3644 URL: https://issues.apache.org/jira/browse/CARBONDATA-3644 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao In the insert carbontable select from a parquet table processing, if the binary column has the content '\001', like 'col1\001col2', the content before '\001' will be truncated as '\001' is the Complex Delimiter. The problem is that Complex Delimiter can't be configured in the insert flow, which needs to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") sql("insert into datatype_struct_carbondata select * > from datatype_struct_parquet") > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > > {code:java} > // > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Fix Version/s: 2.0.0 Affects Version/s: 2.0.0 1.6.1 Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
Xingjun Hao created CARBONDATA-3643: --- Summary: Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet Key: CARBONDATA-3643 URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3635) 【Carbon-Flink】Reduce the time interval at which data is visible
Xingjun Hao created CARBONDATA-3635: --- Summary: 【Carbon-Flink】Reduce the time interval at which data is visible Key: CARBONDATA-3635 URL: https://issues.apache.org/jira/browse/CARBONDATA-3635 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3634) Flink Loading support Complex\Array\Map\Binary
Xingjun Hao created CARBONDATA-3634: --- Summary: Flink Loading support Complex\Array\Map\Binary Key: CARBONDATA-3634 URL: https://issues.apache.org/jira/browse/CARBONDATA-3634 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3633) Support custom CHARSET for encode and decode binary
Xingjun Hao created CARBONDATA-3633: --- Summary: Support custom CHARSET for encode and decode binary Key: CARBONDATA-3633 URL: https://issues.apache.org/jira/browse/CARBONDATA-3633 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3632) Support configure ComplexDelimiters when INSERT
Xingjun Hao created CARBONDATA-3632: --- Summary: Support configure ComplexDelimiters when INSERT Key: CARBONDATA-3632 URL: https://issues.apache.org/jira/browse/CARBONDATA-3632 Project: CarbonData Issue Type: New Feature Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map
Xingjun Hao created CARBONDATA-3631: --- Summary: StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map Key: CARBONDATA-3631 URL: https://issues.apache.org/jira/browse/CARBONDATA-3631 Project: CarbonData Issue Type: Bug Affects Versions: 1.6.1, 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.0 sql("insert into datatype_array_parquet values(array())") sql("insert into datatype_array_carbondata select f from datatype_array_parquet") {code:java} java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935) at java.lang.StringBuilder.substring(StringBuilder.java:76) at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) at org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77) at org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3619) NoSuchMethodError(registerCurrentOperationLog) While Creating Table
[ https://issues.apache.org/jira/browse/CARBONDATA-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3619: Description: ExecuteStatementOperation.java exists in hive-service model and spark-hive-thriftserver model, Leading "NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V" {code:java} Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.protected$registerCurrentOperationLog(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more {code} was: ExecuteStatementOperation.java exists in hive-service model and spark-hive-thriftserver model, Leading "NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V" {code:java} /2019-12-17 11:18:00 WARN CLIService:396 - OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=29bea7eb-0638-47a9-b177-23d50cc5676a]: The background operation was aborted2019-12-17 11:18:00 WARN CLIService:396 - OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=29bea7eb-0638-47a9-b177-23d50cc5676a]: The background operation was abortedjava.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:387) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:610) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1473) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1458) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.protected$registerCurrentOperationLog(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more/ code placeholder {code} > NoSuchMethodError(registerCurrentOperationLog) While Creating Table > --- > > Key: CARBONDATA-3619 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3619 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For
[jira] [Updated] (CARBONDATA-3619) NoSuchMethodError(registerCurrentOperationLog) While Creating Table
[ https://issues.apache.org/jira/browse/CARBONDATA-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3619: Docs Text: (was: 2019-12-17 11:18:00 WARN CLIService:396 - OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=29bea7eb-0638-47a9-b177-23d50cc5676a]: The background operation was aborted java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:387) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:610) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1473) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1458) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.protected$registerCurrentOperationLog(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:173) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more) Description: ExecuteStatementOperation.java exists in hive-service model and spark-hive-thriftserver model, Leading "NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V" {code:java} /2019-12-17 11:18:00 WARN CLIService:396 - OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=29bea7eb-0638-47a9-b177-23d50cc5676a]: The background operation was aborted2019-12-17 11:18:00 WARN CLIService:396 - OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=29bea7eb-0638-47a9-b177-23d50cc5676a]: The background operation was abortedjava.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:387) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:610) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1473) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1458) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.prot
[jira] [Created] (CARBONDATA-3619) NoSuchMethodError(registerCurrentOperationLog) While Creating Table
Xingjun Hao created CARBONDATA-3619: --- Summary: NoSuchMethodError(registerCurrentOperationLog) While Creating Table Key: CARBONDATA-3619 URL: https://issues.apache.org/jira/browse/CARBONDATA-3619 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.6.1, 2.0.0 Reporter: Xingjun Hao Fix For: 2.0.0 ExecuteStatementOperation.java exists in hive-service model and spark-hive-thriftserver model, Leading "NoSuchMethodError: org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.registerCurrentOperationLog()V" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
Xingjun Hao created CARBONDATA-3617: --- Summary: loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow Key: CARBONDATA-3617 URL: https://issues.apache.org/jira/browse/CARBONDATA-3617 Project: CarbonData Issue Type: Improvement Components: data-load Affects Versions: 1.6.1, 2.0.0 Reporter: Xingjun Hao Fix For: 1.6.1, 2.0.0 During loading Data usesing globalsort, the sortby processing is based the whole carbon row, the overhead of gc is huge when there are many columns. Theoretically, the sortby processing can works well just based on the sort columns, which will brings less time overhead and gc overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)