[jira] [Resolved] (CARBONDATA-3992) Drop Index is throwing null pointer exception.
[ https://issues.apache.org/jira/browse/CARBONDATA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal kumar ojha resolved CARBONDATA-3992. -- Resolution: Fixed Fixed in PR: https://github.com/apache/carbondata/pull/3928 > Drop Index is throwing null pointer exception. > -- > > Key: CARBONDATA-3992 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3992 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Index server set to true but index server is not running. > Create an index as 'carbondata' and try to drop the index -> throwing null > pointer exception. > IndexStoreMandaer.Java -> line 98 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on a change in pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
marchpure commented on a change in pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#discussion_r498335797 ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,37 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) -stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) +stageFiles.foreach { stage => + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { + while (retry > 0) { +try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) + break() +} catch { + case _ : FileNotFoundException => break() +LOGGER.warn("The stage file: " + filePath + " does not exist"); Review comment: I have modified the code according to your suggestion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (CARBONDATA-3769) Upgrade hadoop version to 3.1.1 and add maven profile for 2.7.2
[ https://issues.apache.org/jira/browse/CARBONDATA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal closed CARBONDATA-3769. --- Resolution: Not A Problem > Upgrade hadoop version to 3.1.1 and add maven profile for 2.7.2 > --- > > Key: CARBONDATA-3769 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3769 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Upgrade hadoop version to 3.1.1 and add maven profile for 2.7.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3911) NullPointerException is thrown when clean files is executed after two updates
[ https://issues.apache.org/jira/browse/CARBONDATA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3911. - Fix Version/s: 2.1.0 Resolution: Fixed > NullPointerException is thrown when clean files is executed after two updates > - > > Key: CARBONDATA-3911 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3911 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > * create table > * load data > * load one more data > * update1 > * update2 > * clean files > fails with NullPointer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.
Prasanna Ravichandran created CARBONDATA-4022: - Summary: Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations. Key: CARBONDATA-4022 URL: https://issues.apache.org/jira/browse/CARBONDATA-4022 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran Getting this error - "PathName is not a valid DFS filename." during the update/delete/select queries on a added SDK segment table. Also the path represented in the error is not proper, which is the cause of error. This is seen only when index server is running and disable fallback is true. Queries and errors: > create table sdk_2level_1(name string, rec1 > struct>) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.425 seconds) > alter table sdk_2level_1 add segment > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata'); +-+ | Result | +-+ +-+ No rows selected (0.77 seconds) > select * from sdk_2level_1; INFO : Execution ID: 1855 Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in stage 600.0 (TID 21345, linux, executor 16): java.lang.IllegalArgumentException: Pathname /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata from hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188) at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100) at org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60) at org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65) at org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902) at org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) at org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47) at org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:301) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:293) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
CarbonDataQA1 commented on pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#issuecomment-702138964 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2544/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.
Prasanna Ravichandran created CARBONDATA-4021: - Summary: With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment. Key: CARBONDATA-4021 URL: https://issues.apache.org/jira/browse/CARBONDATA-4021 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran We are getting below issues while index server enable and index server fallback disable is configured as true. With count* we are getting the below error, after adding the parquet and ORC segment. Queries and error: > use rps; +-+| Result | +-+ +-+ No rows selected (0.054 seconds) > drop table if exists uniqdata; +-+ |Result| +-+ +-+ No rows selected (0.229 seconds) > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; +-+ |Result| +-+ +-+ No rows selected (0.756 seconds) > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); INFO : Execution ID: 95 +-+| Result | +-+ +-+ No rows selected(2.789 seconds) > use default; +-+ |Result| +-+ +-+ No rows selected (0.052 seconds) > drop table if exists uniqdata; +-+ |Result| +-+ +-+ No rows selected (1.122 seconds) > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.508 seconds) > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); INFO : Execution ID: 108 +-+ |Result| +-+ +-+ No rows selected (1.316 seconds) > drop table if exists uniqdata_parquet; +-+ |Result| +-+ +-+ No rows selected (0.668 seconds) > CREATE TABLE uniqdata_parquet (cust_id int,cust_name > String,active_emui_version string, dob timestamp, doj timestamp, > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), > decimal_column2 decimal(36,36),double_column1 double, double_column2 > double,integer_column1 int) stored as parquet; +-+ |Result| +-+ +-+ No rows selected (0.397 seconds) > insert into uniqdata_parquet select * from uniqdata; INFO : Execution ID: 116 +-+ |Result| +-+ +-+ No rows selected (4.805 seconds) > drop table if exists uniqdata_orc; +-+ |Result| +-+ +-+ No rows selected (0.553 seconds) > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) using orc; +-+ |Result| +-+ +-+ No rows selected (0.396 seconds) > insert into uniqdata_orc select * from uniqdata; INFO : Execution ID: 122 +-+ |Result| +-+ +-+ No rows selected (3.403 seconds) > use rps; +-+ |Result| +-+ +-+ No rows selected (0.06 seconds) > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet'); INFO : Execution ID: 126 +-+ |Result| +-+ +-+ No rows selected (1.511 seconds) > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_orc','format'='orc'); +-+ |Result| +-+ +-+ No rows selected (0.716 seconds) > select count(*) from uniqdata; Error: java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.security.PrivilegedActionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0 failed 4 times, most recent failure: Lost task 2.3
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
CarbonDataQA1 commented on pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#issuecomment-702138034 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4291/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
marchpure commented on pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#issuecomment-702129123 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
CarbonDataQA1 commented on pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#issuecomment-702125570 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2545/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205497#comment-17205497 ] Prasanna Ravichandran commented on CARBONDATA-3914: --- !image-2020-10-01-18-37-20-242.png! > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205495#comment-17205495 ] Prasanna Ravichandran commented on CARBONDATA-3914: --- Attached the screenshot after the fix. > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3914. - This issue is fixed now. Now no errors are thrown, when no rows are present in the carbon table. > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
marchpure commented on pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#issuecomment-702095077 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
Indhumathi27 commented on a change in pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#discussion_r498182999 ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,37 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) -stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) +stageFiles.foreach { stage => + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { + while (retry > 0) { +try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) + break() +} catch { + case _ : FileNotFoundException => break() +LOGGER.warn("The stage file: " + filePath + " does not exist"); Review comment: move this log before break This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3963: [CARBONDATA-4018]Fix CSV header validation not contains dimension columns
akashrn5 commented on pull request #3963: URL: https://github.com/apache/carbondata/pull/3963#issuecomment-702078477 i will add one test case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
Indhumathi27 commented on a change in pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#discussion_r498178139 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -1881,7 +1881,9 @@ public static TableInfo convertGsonToTableInfo(Map properties) { gsonBuilder.registerTypeAdapter(DataType.class, new DataTypeAdapter()); Gson gson = gsonBuilder.create(); -TableInfo tableInfo = gson.fromJson(builder.toString(), TableInfo.class); +// if the column name contains backslash in the column name, then fromJson will remove that, +// so replace like below to keep the "\" in column name and write the proper name in the schema +TableInfo tableInfo = gson.fromJson(builder.toString().replace("\\", ""), TableInfo.class); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
Indhumathi27 commented on a change in pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#discussion_r498176312 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -1881,7 +1881,9 @@ public static TableInfo convertGsonToTableInfo(Map properties) { gsonBuilder.registerTypeAdapter(DataType.class, new DataTypeAdapter()); Gson gson = gsonBuilder.create(); -TableInfo tableInfo = gson.fromJson(builder.toString(), TableInfo.class); +// if the column name contains backslash in the column name, then fromJson will remove that, +// so replace like below to keep the "\" in column name and write the proper name in the schema +TableInfo tableInfo = gson.fromJson(builder.toString().replace("\\", ""), TableInfo.class); Review comment: you can add in CarbonSessionExample This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
akashrn5 commented on a change in pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#discussion_r498175600 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -1881,7 +1881,9 @@ public static TableInfo convertGsonToTableInfo(Map properties) { gsonBuilder.registerTypeAdapter(DataType.class, new DataTypeAdapter()); Gson gson = gsonBuilder.create(); -TableInfo tableInfo = gson.fromJson(builder.toString(), TableInfo.class); +// if the column name contains backslash in the column name, then fromJson will remove that, +// so replace like below to keep the "\" in column name and write the proper name in the schema +TableInfo tableInfo = gson.fromJson(builder.toString().replace("\\", ""), TableInfo.class); Review comment: actually `QueryTest ` has spark session without `carbonSession`, and we cannot create another `sparkSession `when one is running. In example we cant add as test case, so i think since i have added the proper comment, it should be fine i guess. ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -423,6 +423,17 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { sql("drop table table2") } + test("test SI creation with special char column") { +sql("drop table if exists special_char") Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
marchpure commented on pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#issuecomment-702062616 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3962: [CARBONDATA-4017]Fix the insert issue when the column name contains '\' and fix SI creation issue
Indhumathi27 commented on a change in pull request #3962: URL: https://github.com/apache/carbondata/pull/3962#discussion_r498158964 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -423,6 +423,17 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { sql("drop table table2") } + test("test SI creation with special char column") { +sql("drop table if exists special_char") Review comment: can add drop table to afterAll also ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -1881,7 +1881,9 @@ public static TableInfo convertGsonToTableInfo(Map properties) { gsonBuilder.registerTypeAdapter(DataType.class, new DataTypeAdapter()); Gson gson = gsonBuilder.create(); -TableInfo tableInfo = gson.fromJson(builder.toString(), TableInfo.class); +// if the column name contains backslash in the column name, then fromJson will remove that, +// so replace like below to keep the "\" in column name and write the proper name in the schema +TableInfo tableInfo = gson.fromJson(builder.toString().replace("\\", ""), TableInfo.class); Review comment: can you add a test case with carbon session ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala ## @@ -340,7 +336,7 @@ private[sql] case class CarbonCreateSecondaryIndexCommand( try { sparkSession.sql( s"""CREATE TABLE $databaseName.$indexTableName - |(${ fields.mkString(",") }) + |(${ rawSchema }) Review comment: ```suggestion ($rawSchema) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3952: [CARBONDATA-4006] Fix for currentUser as NULL in getcount method during index server fallback mode
kunal642 commented on pull request #3952: URL: https://github.com/apache/carbondata/pull/3952#issuecomment-702057242 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
marchpure commented on a change in pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#discussion_r498155406 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -477,13 +479,23 @@ case class CarbonInsertFromStageCommand( stageFiles.map { stage => executorService.submit(new Runnable { override def run(): Unit = { - val filePath = stage._1.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage._1.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES +breakable (while (retry > 0) try { Review comment: I have modified code according to your suggestion ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { while (retry > 0) { try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) Review comment: I have modified code according to your suggestion ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { while (retry > 0) { try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) +} catch { + case _ : FileNotFoundException => breakable() Review comment: I have modified code according to your suggestion ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -
[jira] [Created] (CARBONDATA-4020) Drop bloom index for single index of table having multiple index drops all indexes
Chetan Bhat created CARBONDATA-4020: --- Summary: Drop bloom index for single index of table having multiple index drops all indexes Key: CARBONDATA-4020 URL: https://issues.apache.org/jira/browse/CARBONDATA-4020 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 2.1.0 Environment: Spark 2.4.5 Reporter: Chetan Bhat Create multiple bloom indexes on the table. Try to drop single bloom index drop table if exists datamap_test_1; CREATE TABLE datamap_test_1 (id int,name string,salary float,dob date)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id'); CREATE index dm_datamap_test_1_2 ON TABLE datamap_test_1(id) as 'bloomfilter' PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true'); CREATE index dm_datamap_test3 ON TABLE datamap_test_1 (name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true'); show indexes on table datamap_test_1; drop index dm_datamap_test_1_2 on datamap_test_1; show indexes on table datamap_test_1; Issue : Drop bloom index for single index of table having multiple index drops all indexes 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1; +--+--+--++--+ | Name | Provider | Indexed Columns | Properties | Status | Sync In +--+--+--++--+ | dm_datamap_test_1_2 | bloomfilter | id | 'INDEX_COLUMNS'='id','bloom_compress'='true','bloom_fpp'='0.1','blo | dm_datamap_test3 | bloomfilter | name | 'INDEX_COLUMNS'='name','bloom_compress'='true','bloom_fpp'='0.1','b +--+--+--++--+ 2 rows selected (0.315 seconds) 0: jdbc:hive2://linux-32:22550/> drop index dm_datamap_test_1_2 on datamap_test_1; +-+ | Result | +-+ +-+ No rows selected (1.232 seconds) 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1; +---+---+--+-+-++ | Name | Provider | Indexed Columns | Properties | Status | Sync Info | +---+---+--+-+-++ +---+---+--+-+-++ No rows selected (0.21 seconds) 0: jdbc:hive2://linux-32:22550/> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3965: [CARBONDATA-4016] NPE and FileNotFound in Show Segments and Insert Stage
Indhumathi27 commented on a change in pull request #3965: URL: https://github.com/apache/carbondata/pull/3965#discussion_r498084784 ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => Review comment: Can use foreach instead of map ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { while (retry > 0) { try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) +} catch { + case _ : FileNotFoundException => breakable() Review comment: should add log if file is not found? ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output = Collections.synchronizedList(new util.ArrayList[StageInput]()) stageFiles.map { stage => - val filePath = stage.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -stageInput.setCreateTime(stage.getLastModifiedTime) -stageInput.setStatus(status) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = READ_FILE_RETRY_TIMES +breakable { while (retry > 0) { try { + val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) + stageInput.setCreateTime(stage.getLastModifiedTime) + stageInput.setStatus(status) + output.add(stageInput) Review comment: should break from the loop, once the stage file is found ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala ## @@ -477,13 +479,23 @@ case class CarbonInsertFromStageCommand( stageFiles.map { stage => executorService.submit(new Runnable { override def run(): Unit = { - val filePath = stage._1.getAbsolutePath - val stream = FileFactory.getDataInputStream(filePath) + val filePath = tableStagePath + CarbonCommonConstants.FILE_SEPARATOR + stage._1.getName + var stream: DataInputStream = null try { -val stageInput = gson.fromJson(new InputStreamReader(stream), classOf[StageInput]) -output.add(stageInput) +stream = FileFactory.getDataInputStream(filePath) +var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES +breakable (while (retry > 0) try { Review comment: please format the code ## File path: integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ## @@ -96,20 +100,31 @@ object CarbonStore { * Read stage files and return input files */ def readStageInput( + tableStagePath: String, stageFiles: Seq[CarbonFile], status: StageInput.StageStatus): Seq[StageInput] = { val gson = new Gson() val output =
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-701971400 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4288/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-701970889 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2541/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3695: [WIP] partition optimization
CarbonDataQA1 commented on pull request #3695: URL: https://github.com/apache/carbondata/pull/3695#issuecomment-701926728 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2540/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3695: [WIP] partition optimization
CarbonDataQA1 commented on pull request #3695: URL: https://github.com/apache/carbondata/pull/3695#issuecomment-701923605 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4287/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org