[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and
vikramahuja1001 commented on a change in pull request #3802: URL: https://github.com/apache/carbondata/pull/3802#discussion_r451286366 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java ## @@ -37,7 +38,6 @@ import org.apache.carbondata.core.util.path.CarbonTablePath; import org.apache.carbondata.processing.loading.model.CarbonLoadModel; import org.apache.carbondata.processing.util.CarbonLoaderUtil; - Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275614 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ## @@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, commitJobFinal(context, loadModel, operationContext, carbonTable, uniqueId); } + /** + * Method to create and write the segment file, removes the temporary directories from all the + * respective partition directories. This method is invoked only when {@link + * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled. + * @param context Job context + * @param loadModel Load model + * @param segmentFileName Segment file name to write + * @param partitionPath Serialized list of partition location + * @throws IOException + */ + @SuppressWarnings("unchecked") + private void writeSegmentWithoutMergeIndex(JobContext context, CarbonLoadModel loadModel, + String segmentFileName, String partitionPath) throws IOException { +Map indexFileNameMap = (Map) ObjectSerializationUtil + .convertStringToObject(context.getConfiguration().get("carbon.index.files.name")); +List partitionList = +(List) ObjectSerializationUtil.convertStringToObject(partitionPath); +SegmentFileStore.SegmentFile finalSegmentFile = null; +boolean isRelativePath; +String partitionLoc; +for (String partition : partitionList) { + isRelativePath = false; + partitionLoc = partition; + if (partitionLoc.startsWith(loadModel.getTablePath())) { +partitionLoc = partitionLoc.substring(loadModel.getTablePath().length()); +isRelativePath = true; + } + SegmentFileStore.SegmentFile segmentFile = new SegmentFileStore.SegmentFile(); + SegmentFileStore.FolderDetails folderDetails = new SegmentFileStore.FolderDetails(); + folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition))); + folderDetails.setPartitions( + Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 1))); + folderDetails.setRelative(isRelativePath); + folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage()); + segmentFile.getLocationMap().put(partitionLoc, folderDetails); + if (finalSegmentFile != null) { +finalSegmentFile = finalSegmentFile.merge(segmentFile); + } else { +finalSegmentFile = segmentFile; + } +} +Objects.requireNonNull(finalSegmentFile); +String segmentFilesLocation = Review comment: Agreed. Moved this code to SegmentFileStore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275288 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ## @@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte } } + override def afterEach(): Unit = { +CarbonProperties.getInstance() Review comment: ok. Removed this afterEach. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-655086214 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3322/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-655062212 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3325/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.
CarbonDataQA1 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-655060959 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1585/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-655059772 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1583/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario
CarbonDataQA1 commented on pull request #3829: URL: https://github.com/apache/carbondata/pull/3829#issuecomment-655023313 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1581/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario
CarbonDataQA1 commented on pull request #3829: URL: https://github.com/apache/carbondata/pull/3829#issuecomment-655022713 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3320/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654983835 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ## @@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: String, path: String, i if (size.isDefined) { dataSize = dataSize + java.lang.Long.parseLong(size.get) } +val indexSize = map.get("carbon.indexsize") +if (indexSize.isDefined) { + indexLen = indexLen + java.lang.Long.parseLong(indexSize.get) +} +val indexFiles = map.get("carbon.index.files.name") +if (indexFiles.isDefined) { + indexFilesName = indexFiles.get Review comment: It is a serialied map. Like "carbon.output.partitions.name" and "carbon.output.files.name", "carbon.index.files.name" is also serialized. We deserialize and use This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ## @@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: String, path: String, i if (size.isDefined) { dataSize = dataSize + java.lang.Long.parseLong(size.get) } +val indexSize = map.get("carbon.indexsize") +if (indexSize.isDefined) { + indexLen = indexLen + java.lang.Long.parseLong(indexSize.get) +} +val indexFiles = map.get("carbon.index.files.name") +if (indexFiles.isDefined) { + indexFilesName = indexFiles.get Review comment: It is a serialied map. Like "carbon.output.partitions.name" and "carbon.output.files.name", "carbon.index.files.name" is also serialized. We deserialize and use it before calling SegmentFileStore.writeSegmentFile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654963937 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1580/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario
ajantha-bhat opened a new pull request #3829: URL: https://github.com/apache/carbondata/pull/3829 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654937795 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3319/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module
CarbonDataQA1 commented on pull request #3828: URL: https://github.com/apache/carbondata/pull/3828#issuecomment-654910077 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1579/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module
CarbonDataQA1 commented on pull request #3828: URL: https://github.com/apache/carbondata/pull/3828#issuecomment-654907957 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3318/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)
CarbonDataQA1 commented on pull request #3786: URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654872715 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3316/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
CarbonDataQA1 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872523 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1577/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
CarbonDataQA1 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872025 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3315/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)
CarbonDataQA1 commented on pull request #3786: URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654870645 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1578/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450849269 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.withCsvInput(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.writerType = WRITER_TYPE.PARQUET; +this.buildParquetReader(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private void buildParquetReader() throws IOException { +AvroReadSupport avroReadSupport = new AvroReadSupport<>(); +ParquetReader parquetReader; +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No Parquet file found at given location. Please provide" + + "the correct folder location."); +} +parquetReader = ParquetReader.builder(avroReadSupport, Review comment: Handle the scenario: Before calling write() I am checking all the file formats. If they are not of same type then throwing the exception. These cases are handled for parquet, ORC, and avro file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450848843 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.withCsvInput(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.writerType = WRITER_TYPE.PARQUET; +this.buildParquetReader(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private void buildParquetReader() throws IOException { +AvroReadSupport avroReadSupport = new AvroReadSupport<>(); +ParquetReader parquetReader; +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No Parquet file found at given location. Please provide" + + "the correct folder location."); +} +parquetReader = ParquetReader.builder(avroReadSupport, +new Path(String.valueOf(dataFiles[0]))).build(); + } else { +parquetReader = ParquetReader.builder(avroReadSupport, Review comment: checked the schema at the time of the carbon writer builder. If not all the files are having the same schema then throwing the exception. ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.withCsvInput(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +
[GitHub] [carbondata] nihal0107 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654840974 > @nihal0107 Please remove unused binary files from this PR Those are not the unused binary files. Those are either parquet or ORC or avro file which I am using inside the UTs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450845011 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.withCsvInput(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.writerType = WRITER_TYPE.PARQUET; +this.buildParquetReader(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private void buildParquetReader() throws IOException { +AvroReadSupport avroReadSupport = new AvroReadSupport<>(); +ParquetReader parquetReader; +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No Parquet file found at given location. Please provide" + + "the correct folder location."); +} +parquetReader = ParquetReader.builder(avroReadSupport, +new Path(String.valueOf(dataFiles[0]))).build(); + } else { +parquetReader = ParquetReader.builder(avroReadSupport, +new Path(this.filePath + "/" + this.fileList.get(0))).build(); + } +} else { + parquetReader = ParquetReader.builder(avroReadSupport, + new Path(this.filePath)).build(); +} +this.avroSchema = parquetReader.read().getSchema(); +this.schema = AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema); + } + + /** + * to build a {@link CarbonWriter}, which accepts loading ORC files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withOrcPath(String filePath) throws IOException { +if (filePath.length() == 0) { + throw new IllegalArgumentException("filePath can not be empty"); +} +this.filePath = filePath; +this.isDirectory = new File(filePath).isDirectory(); +this.writerType = WRITER_TYPE.ORC; +Map options = new HashMap<>(); +options.put("complex_delimiter_level_1", "#"); +options.put("complex_delimiter_level_2", "$"); +options.put("complex_delimiter_level_3", "@"); +this.withLoadOptions(options); +this.buildOrcReader(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts orc files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the orc file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withOrcPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withOrcPath(filePath); +return this; + } + + // build orc reader and convert orc schema to carbon schema. + private void buildOrcReader() throws
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844563 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.Arrays; +import java.util.List; + +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.avro.AvroReadSupport; +import org.apache.parquet.hadoop.ParquetReader; + +/** + * Implementation to write parquet rows in avro format to carbondata file. + */ +public class ParquetCarbonWriter extends AvroCarbonWriter { + private AvroCarbonWriter avroCarbonWriter = null; + private String filePath = ""; + private boolean isDirectory = false; + private List fileList; + + ParquetCarbonWriter(AvroCarbonWriter avroCarbonWriter) { +this.avroCarbonWriter = avroCarbonWriter; + } + + @Override + public void setFilePath(String filePath) { +this.filePath = filePath; + } + + @Override + public void setIsDirectory(boolean isDirectory) { +this.isDirectory = isDirectory; + } + + @Override + public void setFileList(List fileList) { +this.fileList = fileList; + } + + /** + * Load data of all parquet files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.filePath.length() == 0) { + throw new RuntimeException("'withParquetPath()' " + + "must be called to support load parquet files"); +} +if (this.avroCarbonWriter == null) { + throw new RuntimeException("avro carbon writer can not be null"); +} +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No Parquet file found at given location. Please provide " + + "the correct folder location."); +} +Arrays.sort(dataFiles); +for (File dataFile : dataFiles) { + this.loadSingleFile(dataFile); +} + } else { +for (String file : this.fileList) { + this.loadSingleFile(new File(this.filePath + "/" + file)); +} + } +} else { + this.loadSingleFile(new File(this.filePath)); +} + } + + private void loadSingleFile(File file) throws IOException { +AvroReadSupport avroReadSupport = new AvroReadSupport<>(); +ParquetReader parquetReader = ParquetReader.builder(avroReadSupport, +new Path(String.valueOf(file))).withConf(new Configuration()).build(); +GenericRecord genericRecord = null; Review comment: changed as per suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844071 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.*; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.orc.OrcFile; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private CSVCarbonWriter csvCarbonWriter = null; + private String filePath = ""; + private Reader orcReader = null; + private boolean isDirectory = false; + private List fileList; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter) { +this.csvCarbonWriter = csvCarbonWriter; + } + + @Override + public void setFilePath(String filePath) { +this.filePath = filePath; + } + + @Override + public void setIsDirectory(boolean isDirectory) { +this.isDirectory = isDirectory; + } + + @Override + public void setFileList(List fileList) { +this.fileList = fileList; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.filePath.length() == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support load ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No ORC file found at given location. Please provide " + + "the correct folder location."); +} +for (File dataFile : dataFiles) { + this.loadSingleFile(dataFile); +} + } else { +for (String file : this.fileList) { + this.loadSingleFile(new File(this.filePath + "/" + file)); +} + } +} else { + this.loadSingleFile(new File(this.filePath)); +} + } + + private void loadSingleFile(File file) throws IOException { +orcReader = OrcFile.createReader(new Path(String.valueOf(file)), +OrcFile.readerOptions(new Configuration())); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +List valueList = structObjectInspector.getStructFieldsDataAsList(record); +for (int i = 0; i < valueList.size(); i++) { + valueList.set(i, parseOrcObject(valueList.get(i), 0)); +} +this.csvCarbonWriter.write(valueList.toArray()); + } +} else { + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +this.csvCarbonWriter.write(new Object[]{parseOrcObject(record, 0)}); + } +} + } + + private String parseOrcObject(Object obj, int level) { +if (obj instanceof OrcStruct) { + Objects.requireNonNull(orcReader); + StructObjectInspector structObjectInspector = (StructObjectInspector) orcReader + .getObjectInspector(); + List value = structObjectInspector.getStructFieldsDataAsList(obj); + for (int i = 0; i < value.size(); i++) { +value.set(i,
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844187 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.Arrays; +import java.util.List; + +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.avro.AvroReadSupport; +import org.apache.parquet.hadoop.ParquetReader; + +/** + * Implementation to write parquet rows in avro format to carbondata file. + */ +public class ParquetCarbonWriter extends AvroCarbonWriter { + private AvroCarbonWriter avroCarbonWriter = null; + private String filePath = ""; + private boolean isDirectory = false; + private List fileList; + + ParquetCarbonWriter(AvroCarbonWriter avroCarbonWriter) { +this.avroCarbonWriter = avroCarbonWriter; + } + + @Override + public void setFilePath(String filePath) { +this.filePath = filePath; + } + + @Override + public void setIsDirectory(boolean isDirectory) { +this.isDirectory = isDirectory; + } + + @Override + public void setFileList(List fileList) { +this.fileList = fileList; + } + + /** + * Load data of all parquet files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.filePath.length() == 0) { + throw new RuntimeException("'withParquetPath()' " + + "must be called to support load parquet files"); +} +if (this.avroCarbonWriter == null) { + throw new RuntimeException("avro carbon writer can not be null"); +} +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No Parquet file found at given location. Please provide " + + "the correct folder location."); +} +Arrays.sort(dataFiles); +for (File dataFile : dataFiles) { + this.loadSingleFile(dataFile); +} + } else { +for (String file : this.fileList) { + this.loadSingleFile(new File(this.filePath + "/" + file)); +} + } +} else { + this.loadSingleFile(new File(this.filePath)); +} + } + + private void loadSingleFile(File file) throws IOException { +AvroReadSupport avroReadSupport = new AvroReadSupport<>(); +ParquetReader parquetReader = ParquetReader.builder(avroReadSupport, +new Path(String.valueOf(file))).withConf(new Configuration()).build(); +GenericRecord genericRecord = null; +while ((genericRecord = parquetReader.read()) != null) { + System.out.println(genericRecord); Review comment: removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r450843943 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.*; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.orc.OrcFile; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private CSVCarbonWriter csvCarbonWriter = null; + private String filePath = ""; + private Reader orcReader = null; + private boolean isDirectory = false; + private List fileList; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter) { +this.csvCarbonWriter = csvCarbonWriter; + } + + @Override + public void setFilePath(String filePath) { +this.filePath = filePath; + } + + @Override + public void setIsDirectory(boolean isDirectory) { +this.isDirectory = isDirectory; + } + + @Override + public void setFileList(List fileList) { +this.fileList = fileList; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.filePath.length() == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support load ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +if (this.isDirectory) { + if (this.fileList == null || this.fileList.size() == 0) { +File[] dataFiles = new File(this.filePath).listFiles(); +if (dataFiles == null || dataFiles.length == 0) { + throw new RuntimeException("No ORC file found at given location. Please provide " + + "the correct folder location."); +} +for (File dataFile : dataFiles) { + this.loadSingleFile(dataFile); +} + } else { +for (String file : this.fileList) { + this.loadSingleFile(new File(this.filePath + "/" + file)); +} + } +} else { + this.loadSingleFile(new File(this.filePath)); +} + } + + private void loadSingleFile(File file) throws IOException { +orcReader = OrcFile.createReader(new Path(String.valueOf(file)), +OrcFile.readerOptions(new Configuration())); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +List valueList = structObjectInspector.getStructFieldsDataAsList(record); +for (int i = 0; i < valueList.size(); i++) { + valueList.set(i, parseOrcObject(valueList.get(i), 0)); +} +this.csvCarbonWriter.write(valueList.toArray()); + } +} else { + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +this.csvCarbonWriter.write(new Object[]{parseOrcObject(record, 0)}); + } +} + } + + private String parseOrcObject(Object obj, int level) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai opened a new pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module
QiangCai opened a new pull request #3828: URL: https://github.com/apache/carbondata/pull/3828 ### Why is this PR needed? There are many typos in carbondata-core module ### What changes were proposed in this PR? Cleanup typo code for carbondata-core module ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and Fix for Concurrent
akashrn5 commented on pull request #3802: URL: https://github.com/apache/carbondata/pull/3802#issuecomment-654791824 @vikramahuja1001 the issue reproduce steps, please mention in jira and for PR description, give proper issue, root cause and solution proposed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and Fix fo
akashrn5 commented on a change in pull request #3802: URL: https://github.com/apache/carbondata/pull/3802#discussion_r450316210 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java ## @@ -37,7 +38,6 @@ import org.apache.carbondata.core.util.path.CarbonTablePath; import org.apache.carbondata.processing.loading.model.CarbonLoadModel; import org.apache.carbondata.processing.util.CarbonLoaderUtil; - Review comment: revert this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)
akashrn5 commented on pull request #3786: URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654788035 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
akashrn5 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654787100 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception
asfgit closed pull request #3808: URL: https://github.com/apache/carbondata/pull/3808 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
akashrn5 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654776383 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ## @@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: String, path: String, i if (size.isDefined) { dataSize = dataSize + java.lang.Long.parseLong(size.get) } +val indexSize = map.get("carbon.indexsize") +if (indexSize.isDefined) { + indexLen = indexLen + java.lang.Long.parseLong(indexSize.get) +} +val indexFiles = map.get("carbon.index.files.name") +if (indexFiles.isDefined) { + indexFilesName = indexFiles.get Review comment: It is a serialied map. Like "carbon.output.partitions.name" and "carbon.output.files.name", "carbon.index.files.name" is also serialized. We deserialize and use in writeSegmentWithoutMergeIndex(). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala ## @@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: String, path: String, i if (size.isDefined) { dataSize = dataSize + java.lang.Long.parseLong(size.get) } +val indexSize = map.get("carbon.indexsize") +if (indexSize.isDefined) { + indexLen = indexLen + java.lang.Long.parseLong(indexSize.get) +} +val indexFiles = map.get("carbon.index.files.name") +if (indexFiles.isDefined) { + indexFilesName = indexFiles.get Review comment: It is a serialied. "carbon.output.partitions.name" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module
CarbonDataQA1 commented on pull request #3827: URL: https://github.com/apache/carbondata/pull/3827#issuecomment-654748600 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1576/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module
CarbonDataQA1 commented on pull request #3827: URL: https://github.com/apache/carbondata/pull/3827#issuecomment-654746270 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3314/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ## @@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, throw new IOException(e); } } -String segmentFileName = SegmentFileStore.genSegmentFileName( -loadModel.getSegmentId(), String.valueOf(loadModel.getFactTimeStamp())); newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT); -newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize()); +if (isMergeIndex) { Review comment: loadModel.getMetrics().getMergeIndexSize() is filled in MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it else case to line 280. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.
VenuReddy2103 commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java ## @@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet, throw new IOException(e); } } -String segmentFileName = SegmentFileStore.genSegmentFileName( -loadModel.getSegmentId(), String.valueOf(loadModel.getFactTimeStamp())); newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT); -newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize()); +if (isMergeIndex) { Review comment: loadModel.getMetrics().getMergeIndexSize() filled in MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it else case to line 280. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception
CarbonDataQA1 commented on pull request #3808: URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654734465 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1575/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception
CarbonDataQA1 commented on pull request #3808: URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654733164 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3313/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
CarbonDataQA1 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654732910 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1574/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency
CarbonDataQA1 commented on pull request #3811: URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654730812 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3312/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…
Indhumathi27 commented on pull request #3817: URL: https://github.com/apache/carbondata/pull/3817#issuecomment-654728062 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai opened a new pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module
QiangCai opened a new pull request #3827: URL: https://github.com/apache/carbondata/pull/3827 ### Why is this PR needed? need cleanup code for carbondata-hadoop module ### What changes were proposed in this PR? Cleanup code for carbondata-hadoop module ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception
akashrn5 commented on pull request #3808: URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654663162 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception
akashrn5 commented on a change in pull request #3808: URL: https://github.com/apache/carbondata/pull/3808#discussion_r450658081 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/Compactor.scala ## @@ -121,10 +124,20 @@ object Compactor { segmentIdToLoadStartTimeMapping(validSegments.head), SegmentStatus.SUCCESS, carbonLoadModelForMergeDataFiles.getFactTimeStamp, rebuiltSegments.toList.asJava) - +siCompactionIndexList ::= indexCarbonTable } catch { case ex: Exception => LOGGER.error(s"Compaction failed for SI table ${secondaryIndex.indexName}", ex) + // If any compaction is failed then make all SI disabled which are success. + // They will be enabled in next load + siCompactionIndexList.foreach { indexCarbonTable => +sparkSession.sql( + s""" + | ALTER TABLE ${carbonLoadModel.getDatabaseName}.${indexCarbonTable.getTableName} + | SET + | SERDEPROPERTIES ('isSITableEnabled' = 'false') Review comment: move this line above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org