[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683604932 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3966) NullPointerException is thrown in case of reliability testing of load, compaction and query
Akash R Nilugal created CARBONDATA-3966: --- Summary: NullPointerException is thrown in case of reliability testing of load, compaction and query Key: CARBONDATA-3966 URL: https://issues.apache.org/jira/browse/CARBONDATA-3966 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal Sometimes NullPointerException is thrown in case of reliability testing of load, compaction and query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 opened a new pull request #3907: [CARBONDATA-3966]Fix nullPointerException issue in case of reliability testing of load and compaction
akashrn5 opened a new pull request #3907: URL: https://github.com/apache/carbondata/pull/3907 ### Why is this PR needed? During the carbondata reliability and concurrency test of load, compaction and query, Some times nullPointerException is thrown. This is because, in `TableSegmentRefresher ` we get the last modified timestamp of the segment file, to decide to refresh the cache, in case of concurrency, it can happen that the segment file get deleted or during update, file may not be there, that time getLastModified time throws null pointer. ### What changes were proposed in this PR? Before get the last modified time, always check for the file exists, as it can be deleted during that time due to concurrency, if not present, initialize to zero. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No(verified with concurrency of 1000s of segments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959497 ## File path: docs/index/secondary-index-guide.md ## @@ -188,4 +188,25 @@ where we have old stores. Syntax ``` REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name - ``` \ No newline at end of file + ``` + +### Reindex Command +This command is used to reload segments in the SI table in case when there is some mismatch in the number +of segments with main table. + +Syntax + +Reindex on all the secondary Indexes on the main table + ``` + REINDEX ON TABLE [db_name.]main_table_name [WHERE SEGMENT.ID IN(0,1)] + ``` +Reindex on index table level Review comment: done ## File path: docs/index/secondary-index-guide.md ## @@ -188,4 +188,25 @@ where we have old stores. Syntax ``` REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name - ``` \ No newline at end of file + ``` + +### Reindex Command +This command is used to reload segments in the SI table in case when there is some mismatch in the number +of segments with main table. + +Syntax + +Reindex on all the secondary Indexes on the main table Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959112 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479960111 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Show indexes on the table + */ +case class IndexRepairCommand(indexname: Option[String], tableNameOp: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand{ + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // table level and index level + val databaseName = if (tableNameOp.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableNameOp.database.get.toString + } + triggerRepair(tableNameOp.table, databaseName, indexname.isEmpty, indexname, segments) +} else { + // for all tables in the db +sparkSession.sessionState.catalog.listTables(dbName).foreach { + tableIdent => +triggerRepair(tableIdent.table, dbName, indexname.isEmpty, indexname, segments) +} +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, allIndex: Boolean, +indexName: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val carbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(carbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager Review comment: added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: can it be private and non-static method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: It can be private and non-static method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479962896 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT); +org.apache.avro.Schema parquetSchema = ParquetCarbonWriter +.extractParquetSchema(dataFiles[0], this.hadoopConf); +this.dataFiles = dataFiles; +this.avroSchema = parquetSchema; +this.schema = AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private CarbonFile[] extractDataFiles(String suf) { +List dataFiles; +if (this.isDirectory) { + if (CollectionUtils.isEmpty(this.fileList)) { +dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, this.hadoopConf); + } else { +dataFiles = this.appendFileListWithPath(); + } +} else { + dataFiles = new ArrayList<>(); + dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("Data files can't be empty."); +} +return dataFiles.toArray(new CarbonFile[0]); + } + + /** + * to build a {@link CarbonWriter}, which accepts loading ORC files. + * + * @param f
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479973113 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, Review comment: We have some code duplications for each type of writer. Suggest to refactor it. Something like this - ```suggestion CarbonWriter carbonWriter; if (this.writerType == WRITER_TYPE.AVRO) { // AVRO records are pushed to Carbon as Object not as Strings. This was done in order to // handle multi level complex type support. As there are no conversion converter step is // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); carbonWriter = new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); } else if (this.writerType == WRITER_TYPE.JSON) { loadModel.setJsonFileLoad(true); carbonWriter = new JsonCarbonWriter(loadModel, hadoopConf); } else if (this.writerType == WRITER_TYPE.PARQUET) { loadModel.setLoadWithoutConverterStep(true); carbonWriter = new ParquetCarbonWriter(loadModel, hadoopConf, this.avroSchema); } else if (this.writerType == WRITER_TYPE.ORC) { carbonWriter = new ORCCarbonWriter(loadModel, hadoopConf); } else { // CSV CSVCarbonWriter csvCarbonWriter = new CSVCarbonWriter(loadModel, hadoopConf); if (!this.options.containsKey(CarbonCommonConstants.FILE_HEADER)) { csvCarbonWriter.setSkipHeader(true); } carbonWriter = csvCarbonWriter; } if (!StringUtils.isEmpty(filePath)) { carbonWriter.validateAndSetDataFiles(this.dataFiles); } return carbonWriter; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645642 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3927/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645409 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2186/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683661702 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2187/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683663848 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3928/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683675465 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3929/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683681865 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2189/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683683411 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2188/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021451 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021578 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT); +org.apache.avro.Schema parquetSchema = ParquetCarbonWriter +.extractParquetSchema(dataFiles[0], this.hadoopConf); +this.dataFiles = dataFiles; +this.avroSchema = parquetSchema; +this.schema = AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private CarbonFile[] extractDataFiles(String suf) { +List dataFiles; +if (this.isDirectory) { + if (CollectionUtils.isEmpty(this.fileList)) { +dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, this.hadoopConf); + } else { +dataFiles = this.appendFileListWithPath(); + } +} else { + dataFiles = new ArrayList<>(); + dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("Data files can't be empty."); +} +return dataFiles.toArray(new CarbonFile[0]); + } + + /** + * to build a {@link CarbonWriter}, which accepts loading ORC files. + * + * @param fileP
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021706 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683685077 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3930/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683729131 @akashrn5 @kumarvishal09 @QiangCai @ravipesala @ajantha-bhat Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683728958 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683731944 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3931/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683840584 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3932/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683841817 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3895: [WIP]SI fix for not equal to filter
vikramahuja1001 commented on pull request #3895: URL: https://github.com/apache/carbondata/pull/3895#issuecomment-683868709 @ajantha-bhat ,i checked that PR, maybe you can add the test cases for not equal and check SI pushdown This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
vikramahuja1001 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683871061 @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 closed pull request #3895: [WIP]SI fix for not equal to filter
vikramahuja1001 closed pull request #3895: URL: https://github.com/apache/carbondata/pull/3895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
vikramahuja1001 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-683869471 @nihal0107 , can a test case be added for your fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683866796 @akashrn5 , @kunal642 please check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? There was already some testcase related to notNull pushdown was failing in `TestNIQueryWithIndex`, I will check and add notEquals anyways. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-683873583 Add jira ID This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480230761 ## File path: integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java ## @@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet expected) throws SQLExcep Assert.assertTrue(numOfColumnsExpected > 0); Assert.assertEquals(actual.getMetaData().getColumnCount(), numOfColumnsExpected); for (int i = 1; i <= numOfColumnsExpected; i++) { -Assert.assertEquals(actual.getString(i), actual.getString(i)); +if (actual.getString(i).contains(":")) { + Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), expected.getString(i))); +} else { + Assert.assertEquals(actual.getString(i), expected.getString(i)); +} +// System.out.println(actual.getString(i)); Review comment: Remove this comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480233177 ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; Review comment: Use CarbonCommonConstants.COMMA instead of ',' ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; + subType = (String[]) ArrayUtils.removeElement(subType, subType[i]); +} + } return DataTypes .createMapType(convertHiveTypeToCarbon(subType[0]), convertHiveTypeToCarbon(subType[1])); } else if (type.startsWith("struct<")) { String[] subTypes = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); List structFieldList = new ArrayList<>(); - for (String subType : subTypes) { + for (int i = 0; i < subTypes.length; i++) { +String subType = subTypes[i]; +if (subType.startsWith("decimal")) { + subType += ',' + subTypes[++i]; Review comment: Use CarbonCommonConstants.COMMA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat removed a comment on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat removed a comment on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? There was already some testcase related to notNull pushdown was failing in `TestNIQueryWithIndex`, I will check and add notEquals anyways. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683903456 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? @vikramahuja1001 : Done added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 opened a new pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
kunal642 opened a new pull request #3908: URL: https://github.com/apache/carbondata/pull/3908 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3967) Cache partitions to improve partition pruning performance
Kunal Kapoor created CARBONDATA-3967: Summary: Cache partitions to improve partition pruning performance Key: CARBONDATA-3967 URL: https://issues.apache.org/jira/browse/CARBONDATA-3967 Project: CarbonData Issue Type: Improvement Reporter: Kunal Kapoor Assignee: Kunal Kapoor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading
[ https://issues.apache.org/jira/browse/CARBONDATA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3961: - Issue Type: Improvement (was: Bug) > Reorder filter according to the column storage ordinal to improve reading > - > > Key: CARBONDATA-3961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3961 > Project: CarbonData > Issue Type: Improvement >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
ShreelekhyaG opened a new pull request #3909: URL: https://github.com/apache/carbondata/pull/3909 ### Why is this PR needed? To ensure the date/timestamp that is supported by hive also to be supported by carbon. Ex: -01-01 is accepted by hive as a valid record and converted to 0001-01-01. ### What changes were proposed in this PR? Changed the min value of date which is used for validation. When setlenient flag is set to true, carbon can convert and support year. ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480273615 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { Review comment: is filterAttributes same as originalFilterAttributes?? code looks to be same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480276719 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation Review comment: why not use `indexTableRelation.carbonRelation.databaseName` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480285525 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation + .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName +// filter out all the index tables which are disabled +val enabledMatchingIndexTables = matchingIndexTables + .filter(table => { +sparkSession.sessionState.catalog + .getTableMetadata(TableIdentifier(table, +Some(databaseName))).storage + .properties + .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true") + }) +// 2. check if only one SI matches for the filter columns +if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size == 1 && +filterAttributes.intersect(originalFilterAttributes).size == +originalFilterAttributes.size) { + // 3. check if all the sort columns is in SI + val sortColumns = sort +.order +.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase()) +.toSet + val indexCarbonTable = CarbonEnv +.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) Review comment: use indexTableRelation.carbonTable to get indexCarbonTable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480286695 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation + .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName +// filter out all the index tables which are disabled +val enabledMatchingIndexTables = matchingIndexTables + .filter(table => { +sparkSession.sessionState.catalog + .getTableMetadata(TableIdentifier(table, +Some(databaseName))).storage + .properties + .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true") + }) +// 2. check if only one SI matches for the filter columns +if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size == 1 && +filterAttributes.intersect(originalFilterAttributes).size == +originalFilterAttributes.size) { + // 3. check if all the sort columns is in SI + val sortColumns = sort +.order +.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase()) +.toSet + val indexCarbonTable = CarbonEnv +.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) + var allColumnsFound = true Review comment: use forall to check whether all columns exists or not This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480289433 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -58,6 +58,16 @@ object NodeType extends Enumeration { */ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { + // to store the sort node per query + var sortNodeForPushDown: Sort = _ + + // to store the limit literal per query + var limitLiteral : Literal = _ + + // by default do not push down notNull filter, + // but for orderby limit push down, push down notNull filter also. Else we get wrong results. + var pushDownNotNullFilter : Boolean = _ Review comment: Why not keep these as local variables in transformFilterToJoin and pass to rewritePlanForSecondaryIndex()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI
Karan-c980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683945124 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964482 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964722 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683966337 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2193/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
CarbonDataQA1 commented on pull request #3909: URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683967436 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3935/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
CarbonDataQA1 commented on pull request #3909: URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683969333 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2194/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683969864 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3933/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
akkio-97 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480358731 ## File path: integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java ## @@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet expected) throws SQLExcep Assert.assertTrue(numOfColumnsExpected > 0); Assert.assertEquals(actual.getMetaData().getColumnCount(), numOfColumnsExpected); for (int i = 1; i <= numOfColumnsExpected; i++) { -Assert.assertEquals(actual.getString(i), actual.getString(i)); +if (actual.getString(i).contains(":")) { + Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), expected.getString(i))); +} else { + Assert.assertEquals(actual.getString(i), expected.getString(i)); +} +// System.out.println(actual.getString(i)); Review comment: done ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; Review comment: done ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; + subType = (String[]) ArrayUtils.removeElement(subType, subType[i]); +} + } return DataTypes .createMapType(convertHiveTypeToCarbon(subType[0]), convertHiveTypeToCarbon(subType[1])); } else if (type.startsWith("struct<")) { String[] subTypes = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); List structFieldList = new ArrayList<>(); - for (String subType : subTypes) { + for (int i = 0; i < subTypes.length; i++) { +String subType = subTypes[i]; +if (subType.startsWith("decimal")) { + subType += ',' + subTypes[++i]; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3968) Hive read complex types issues
Akshay created CARBONDATA-3968: -- Summary: Hive read complex types issues Key: CARBONDATA-3968 URL: https://issues.apache.org/jira/browse/CARBONDATA-3968 Project: CarbonData Issue Type: Bug Components: hive-integration Reporter: Akshay # Issues in reading of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684012482 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2195/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684015193 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3936/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684022546 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3937/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684031807 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2197/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3968) Hive read complex types issues
[ https://issues.apache.org/jira/browse/CARBONDATA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay updated CARBONDATA-3968: --- Description: # Issues in reading array/map/struct of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. was: # Issues in reading of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. > Hive read complex types issues > -- > > Key: CARBONDATA-3968 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3968 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Reporter: Akshay >Priority: Major > > # Issues in reading array/map/struct of byte, varchar and decimal types. > # Map of primitive type with only one row inserted has issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684057932 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2198/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684059041 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3938/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300 We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 removed a comment on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 removed a comment on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300 We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684216370 > @nihal0107 , can a test case be added for your fix? We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
kunal642 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-684406340 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3963) timestamp data is wrong in case of reading carbon via hive and other issue
[ https://issues.apache.org/jira/browse/CARBONDATA-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3963. -- Fix Version/s: 2.1.0 Resolution: Fixed > timestamp data is wrong in case of reading carbon via hive and other issue > -- > > Key: CARBONDATA-3963 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3963 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > 1. timestamp data is wrong when carbon table is read via hive > 2. carbon is not giving any data in beeline when queries via hive -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
asfgit closed pull request #3903: URL: https://github.com/apache/carbondata/pull/3903 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org