[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478869005 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java ## @@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder, return (Object[]) input[i]; } + public static List extractFilesFromFolder(String path, + String suf, Configuration hadoopConf) { +List dataFiles = listFiles(path, suf, hadoopConf); +List carbonFiles = new ArrayList<>(); +for (Object dataFile: dataFiles) { + carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("No file found at given location. Please provide" + + "the correct folder location."); +} +return carbonFiles; + } + + public static DataFileStream buildAvroReader(CarbonFile carbonFile, Review comment: If validate Files methods are refactored to respecitive type carbonwriter as given in another review comment, no longer require these type specific buildreaders in this file(SDKUtil.java). Same for other readers below(buildOrcReader, buildParquetReader, buildCsvParser, buildJsonReader) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-682351651 @kumarvishal09 : PR is ready. Please check and merge once build passed. Also I have prepared the commit message with co-authored by @akkio-97 . Please use the same commit message. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478856181 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { Review comment: **Also initialized the stack always, checking for empty now. no null check is present** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478855676 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java ## @@ -255,6 +255,12 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi CarbonColumnVector vector = vectorInfo.vector; BitSet deletedRows = vectorInfo.deletedRows; DataType vectorDataType = vector.getType(); + VectorUtil vectorUtil = new VectorUtil(vectorInfo, pageSize, vector, vectorDataType) Review comment: removed this whole class itself and updating vector inside ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478855152 ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java ## @@ -102,6 +109,57 @@ public CarbonColumnVectorImpl(int batchSize, DataType dataType) { } + @Override + public List getChildrenVector() { +return childrenVector; + } + + public void setChildrenVector(ArrayList childrenVector) { +this.childrenVector = childrenVector; + } + + public ArrayList getNumberOfChildrenElementsInEachRow() { +return childElementsForEachRow; + } + + public void setNumberOfChildrenElementsInEachRow(ArrayList childrenElements) { +this.childElementsForEachRow = childrenElements; + } + + public void setNumberOfChildrenElementsForArray(byte[] childPageData, int pageSize) { +// for complex array type, go through parent page to get the child information +ByteBuffer childInfoBuffer = ByteBuffer.wrap(childPageData); +ArrayList childElementsForEachRow = new ArrayList<>(); +// osset will be an INT size and value will be another INT size, hence 2 * INT size Review comment: added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478852900 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.parquet.hadoop.ParquetReader; + +/** + * Implementation to write parquet rows in avro format to carbondata file. + */ +public class ParquetCarbonWriter extends AvroCarbonWriter { Review comment: Why extending `AvroCarbonWriter` and also having it as field in this class? Both inheritence & composition at the same time ? This case need just one of them. probably you need to relook into this once again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478853321 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { Review comment: Why extending CSVCarbonWriter and also having it as field in this class? Both inheritence & composition at the same time ? This case need just one of them. probably you need to relook into this once again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478853321 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { Review comment: Why extending CSVCarbonWriter and also having it as field in this class? Both inheritence & composition at the same time ? Just need one of them. I think, you need to relook into this once again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478852900 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.parquet.hadoop.ParquetReader; + +/** + * Implementation to write parquet rows in avro format to carbondata file. + */ +public class ParquetCarbonWriter extends AvroCarbonWriter { Review comment: Why extending `AvroCarbonWriter` and also having it as field in this class? Both inheritence & composition at the same time ? Just need one of them. I think, you need to relook into this once again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478837180 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) Review comment: added it from the caller This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-682327952 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682327208 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478831360 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val mainCarbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(mainCarbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager + .readTableStatusFile(tableStatusFilePath).toList.asJava) +carbonLoadModel.setCarbonDataLoadSchema(new CarbonDataLoadSchema(mainCarbonTable)) + +val indexMetadata = mainCarbonTable.getIndexMetadata +val secondaryIndexProvider = IndexType.SI.getIndexProviderName +if (null != indexMetadata && null != indexMetadata.getIndexesMap && + null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { + val indexTables = indexMetadata.getIndexesMap +.get(secondaryIndexProvider).keySet().asScala + // if there are no index tables for a given fact table do not perform any action + if (indexTables.nonEmpty) { +val mainTableDetails = if (segments.isEmpty) { + carbonLoadModel.getLoadMetadataDetails.asScala.toList + // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +} else { + // get segments for main table + carbonLoadModel.getLoadMetadataDetails.asScala.toList.filt
[GitHub] [carbondata] kunal642 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table
kunal642 commented on pull request #3904: URL: https://github.com/apache/carbondata/pull/3904#issuecomment-682321292 @akashrn5 please fix the build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478826107 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val mainCarbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(mainCarbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager + .readTableStatusFile(tableStatusFilePath).toList.asJava) +carbonLoadModel.setCarbonDataLoadSchema(new CarbonDataLoadSchema(mainCarbonTable)) + +val indexMetadata = mainCarbonTable.getIndexMetadata +val secondaryIndexProvider = IndexType.SI.getIndexProviderName +if (null != indexMetadata && null != indexMetadata.getIndexesMap && + null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { + val indexTables = indexMetadata.getIndexesMap +.get(secondaryIndexProvider).keySet().asScala + // if there are no index tables for a given fact table do not perform any action + if (indexTables.nonEmpty) { +val mainTableDetails = if (segments.isEmpty) { + carbonLoadModel.getLoadMetadataDetails.asScala.toList + // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +} else { + // get segments for main table + carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter( +
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478825910 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) Review comment: Why "readLoadMetadata" is till there for maintable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478824388 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loa
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478824388 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loa
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-682233816 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3898/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-682231829 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2157/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682212894 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682211893 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2156/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 opened a new pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues
akkio-97 opened a new pull request #3906: URL: https://github.com/apache/carbondata/pull/3906 ### Why is this PR needed? 1) Added test cases for hive read complex types. 2) Handled issues related to reading of byte, varchar and decimal types. ### What changes were proposed in this PR? 1) Added test cases for hive read complex types. 2) Handled issues related to reading of byte, varchar and decimal types. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682161092 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682146723 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2155/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682145902 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3896/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682106892 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2154/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682105315 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3895/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] brijoobopanna commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
brijoobopanna commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682087451 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-682063656 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2153/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-682062796 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3894/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478545280 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +List valueList = structObjectInspector.getStructFieldsDataAsList(record); +for (int i = 0; i < valueList.size(); i++) { + valueList.set(i, parseOrcObject(valueList.get(i), 0)); +} +this.csvCarbonWriter.write(valueList.toArray()); + } +} else { + while (recordReader.hasNext()) { Review comment: Curious to know when does this else case hit ? You testcase do not seem to cover it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478495670 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. Review comment: was looking at API documentation and references from internet. Info in the documentation is limited though. `recordReader.next()` takes previous record as arg. But we are always passing null ? Same applies to below else case too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478495670 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. Review comment: was looking at API documentation and references from internet. Info in the documentation is limited though. `recordReader.next()` takes previous record as arg. But we are always passing null ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681994461 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-681987195 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2152/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-681978515 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3893/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681969792 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2151/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681965099 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3892/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681942622 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2150/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681940790 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3891/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478408315 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { Review comment: We shouldn't have created the writer instance at first place if `csvCarbonWriter` was null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478395584 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java ## @@ -91,4 +106,44 @@ public void close() throws IOException { throw new IOException(e); } } + + private void loadSingleFile(CarbonFile file) throws IOException { +Reader reader = null; +try { + reader = SDKUtil.buildJsonReader(file, configuration); + JSONParser jsonParser = new JSONParser(); + Object jsonRecord = jsonParser.parse(reader); + if (jsonRecord instanceof JSONArray) { +JSONArray jsonArray = (JSONArray) jsonRecord; +for (Object record : jsonArray) { + this.write(record.toString()); +} + } else { +this.write(jsonRecord.toString()); + } +} catch (Exception e) { Review comment: Good to use specific exceptions wherever possible instead of generic exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478395584 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java ## @@ -91,4 +106,44 @@ public void close() throws IOException { throw new IOException(e); } } + + private void loadSingleFile(CarbonFile file) throws IOException { +Reader reader = null; +try { + reader = SDKUtil.buildJsonReader(file, configuration); + JSONParser jsonParser = new JSONParser(); + Object jsonRecord = jsonParser.parse(reader); + if (jsonRecord instanceof JSONArray) { +JSONArray jsonArray = (JSONArray) jsonRecord; +for (Object record : jsonArray) { + this.write(record.toString()); +} + } else { +this.write(jsonRecord.toString()); + } +} catch (Exception e) { Review comment: Use specific exceptions instead of generic exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478387596 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java ## @@ -91,4 +102,39 @@ public void close() throws IOException { throw new IOException(e); } } + + private void loadSingleFile(File file) throws IOException { +try { + Reader reader = SDKUtil.buildJsonReader(file); + JSONParser jsonParser = new JSONParser(); + Object jsonRecord = jsonParser.parse(reader); + if (jsonRecord instanceof JSONArray) { +JSONArray jsonArray = (JSONArray) jsonRecord; +for (Object record : jsonArray) { + this.write(record.toString()); +} + } else { +this.write(jsonRecord.toString()); + } +} catch (Exception e) { + e.printStackTrace(); + throw new IOException(e.getMessage()); Review comment: Need to close stream in validateJsonFiles as well. Please check for all the stream cases in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478378839 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loa
[GitHub] [carbondata] ajantha-bhat commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681914750 > So for complex type user must configure table page size to get the better query performance while creating the table. @kumarvishal09: I too strongly agree that if user using complex type with huge data in one row, page size is mandatory to use. If I make it default now, some testcase validation will fail (example CLI test case which are checking page size), I will handle in another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478375043 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loa
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478374714 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { Review comment: Objects.isnull and Objects.nonNull is internally just doing the same. For just "if check" it is not much useful (We have not used in our code for if checks also anywhere) **I will use it in the future if I use streams.** ![Screenshot from 2020-08-27 17-43-51](https://user-images.githubusercontent.com/5889404/91441253-7d649d00-e88d-11ea-8d30-aa73af1cd7d6.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478367442 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } + } else if (pageDataType == DataTypes.BYTE_ARRAY) { +if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY +|| vectorDataType == DataTypes.VARCHAR) { + // for complex primitive string, binary, varchar type + int offset = 0; + for (int i = 0; i < pageSize; i++) { +byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()]; Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java ## @@ -98,6 +98,9 @@ */ protected CarbonIterator queryIterator; + // Size of the ReusableDataBuffer based on the number of dimension projection columns + protected int reusableDimensionBufferSize = 0; Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java ## @@ -153,6 +154,9 @@ private ReusableDataBuffer[] measureReusableBuffer; + // index used by dimensionReusableBuffer + int dimensionReusableBufferIndex = 0; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
CarbonDataQA1 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681900533 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2149/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
CarbonDataQA1 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681898190 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3890/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478354030 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loa
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681894212 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3889/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681891693 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2148/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478343705 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + if (!StringUtils.isEmpty(filePath)) { Review comment: This condition never seem to fail ?? Same for json and csv writer case below. We do not have similar check for parquet and orc types below This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478343705 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + if (!StringUtils.isEmpty(filePath)) { Review comment: This condition never seem to fail ?? Same for json writer case below. We do not have similar check for parquet and orc types below This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServ
vikramahuja1001 commented on pull request #3893: URL: https://github.com/apache/carbondata/pull/3893#issuecomment-681886430 Please make the necessary changes in the index-server documentation as well for the property changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478333555 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + if (!StringUtils.isEmpty(filePath)) { +avroCarbonWriter.setDataFiles(this.dataFiles); + } + return avroCarbonWriter; } else if (this.writerType == WRITER_TYPE.JSON) { loadModel.setJsonFileLoad(true); - return new JsonCarbonWriter(loadModel, hadoopConf); + JsonCarbonWriter jsonCarbonWriter = new JsonCarbonWriter(loadModel, hadoopConf); + if (!StringUtils.isEmpty(filePath)) { +jsonCarbonWriter.setDataFiles(this.dataFiles); + } + return jsonCarbonWriter; +} else if (this.writerType == WRITER_TYPE.PARQUET) { + loadModel.setLoadWithoutConverterStep(true); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + ParquetCarbonWriter parquetCarbonWriter = new + ParquetCarbonWriter(avroCarbonWriter, hadoopConf); Review comment: Instead of creating instance of `AvroCarbonWriter` and passing it to `ParquetCarbonWriter`, create `AvroCarbonWriter` instance internally within the constructor of `ParquetCarbonWriter` Suggest same for `ORCCarbonWriter` below This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478329089 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) Review comment: removed multiple readings, reading only once ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), +indexMetadata.getParentTableName, +indexColumns.split(",").toList, +indexTableName) + + var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) Review comment: removed, it was redundant code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
CarbonDataQA1 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-681874973 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2147/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478323052 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { Review comment: These validate methods(validate AvroFiles, ParquetFiles, CsvFiles, JsonFiles, OrcFiles) can be in the respective type of carbon writers. Because the way validate methods are implemented are specific to the respective format. Validate methods get readers/parse. Very similar code is in respective writers. Also can the validation method be called from `CarbonWriterBuilder.build()` method based respecitve `writerType` if `dataFiles` are not null ? I think, it can be abstract method in `CarbonWriter` to set input files to read, that is implemented by all the writers. Writers can validate and set them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478323052 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { Review comment: These validate methods(validate AvroFiles, ParquetFiles, CsvFiles, JsonFiles, OrcFiles) can be in the respective type of carbon writers. Because the way validate methods are implemented are specific to the respective format. Validate methods get readers/parse. Very similar code is in respective writers. Also can the validation method be called from CarbonWriterBuilder.build() method based respecitve writerType if dataFiles are not null ? I think, it can be abstract method in CarbonWriter to set input files to read, that is implemented by all the writers. Writers can validate and set them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
CarbonDataQA1 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-681868472 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3888/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478314463 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val mainCarbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(mainCarbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager + .readTableStatusFile(tableStatusFilePath).toList.asJava) +carbonLoadModel.setCarbonDataLoadSchema(new CarbonDataLoadSchema(mainCarbonTable)) + +val indexMetadata = mainCarbonTable.getIndexMetadata +val secondaryIndexProvider = IndexType.SI.getIndexProviderName +if (null != indexMetadata && null != indexMetadata.getIndexesMap && + null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { + val indexTables = indexMetadata.getIndexesMap +.get(secondaryIndexProvider).keySet().asScala + // if there are no index tables for a given fact table do not perform any action + if (indexTables.nonEmpty) { +val mainTableDetails = if (segments.isEmpty) { + carbonLoadModel.getLoadMetadataDetails.asScala.toList + // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +} else { + // get segments for main table + carbonLoadModel.getLoadMetadataDetails.asScala.toList.filt
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r478314130 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), +indexMetadata.getParentTableName, +indexColumns.split(",").toList, +indexTableName) + + var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) + // If it empty, then no need to do further computations because the + // tabletstatus might not have been created and hence next load will take care + if (details.isEmpty) { +Seq.empty + } + + val failedLoadMetadataDetails: java.util.List[LoadMetadataDetails] = new util + .ArrayList[LoadMetadataDetails]() + + // read the details of SI table and get all the failed segments during SI + // creation which are MARKED_FOR_DELETE or invalid INSERT_IN_PROGRESS + details.collect { Review comment: done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), Review comment: done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get Review comment: done This is
[GitHub] [carbondata] akashrn5 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
akashrn5 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681846351 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681845433 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2145/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-681845603 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2144/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681843843 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3886/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-681842224 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3885/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.
[ https://issues.apache.org/jira/browse/CARBONDATA-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal kumar ojha updated CARBONDATA-3964: - Priority: Minor (was: Major) > Select * from table or select count(*) without filter is throwing null > pointer exception. > - > > Key: CARBONDATA-3964 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3964 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce. > 1. Create a table. > 2. Load around 500 segments and more than 1 million records. > 3. Running query select(*) or select count(*) without filter is throwing null > pointer exception. > File: TableIndex.java > Method: pruneWithMultiThread > line: 447 > Reason: filter.getresolver() is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478286002 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -17,25 +17,19 @@ package org.apache.carbondata.sdk.file; +import java.io.FileNotFoundException; import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.HashMap; -import java.util.HashSet; -import java.util.List; -import java.util.Map; -import java.util.Objects; -import java.util.Set; -import java.util.TreeMap; -import java.util.UUID; +import java.util.*; Review comment: wildcard import. same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478286002 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -17,25 +17,19 @@ package org.apache.carbondata.sdk.file; +import java.io.FileNotFoundException; import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.HashMap; -import java.util.HashSet; -import java.util.List; -import java.util.Map; -import java.util.Objects; -import java.util.Set; -import java.util.TreeMap; -import java.util.UUID; +import java.util.*; Review comment: same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478284566 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +93,36 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all or selected csv files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withCsvPath()' must be called to support load files"); +} +this.csvParser = SDKUtil.buildCsvParser(this.configuration); +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +this.csvParser.beginParsing(FileFactory.getDataInputStream(file.getPath(), -1, configuration)); Review comment: InputStream is not closed. same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478195952 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -823,6 +829,31 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all avro files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withAvroPath()' must be called to support loading avro files"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); Review comment: Is this sort required ? Same is applicable for other writers too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478250989 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -823,6 +829,31 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all avro files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withAvroPath()' must be called to support loading avro files"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +DataFileStream avroReader = SDKUtil +.buildAvroReader(file, this.configuration); Review comment: `avroReader` stream is not closed neither in success case not in failure/exception cases. File InputStream used to create this DataFileStream in buildAvroReader is not closed. Check for all the stream reader/writer cases in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 opened a new pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 opened a new pull request #3905: URL: https://github.com/apache/carbondata/pull/3905 ### Why is this PR needed? In case of 1 million record and 500 segments select query without filter is thowing null pointer exception. ### What changes were proposed in this PR? Select query without filter should execute pruneWithoutFilter method rather than pruneWithMultiThread. Added null check for filter. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.
Nihal kumar ojha created CARBONDATA-3964: Summary: Select * from table or select count(*) without filter is throwing null pointer exception. Key: CARBONDATA-3964 URL: https://issues.apache.org/jira/browse/CARBONDATA-3964 Project: CarbonData Issue Type: Bug Reporter: Nihal kumar ojha Steps to reproduce. 1. Create a table. 2. Load around 500 segments and more than 1 million records. 3. Running query select(*) or select count(*) without filter is throwing null pointer exception. File: TableIndex.java Method: pruneWithMultiThread line: 447 Reason: filter.getresolver() is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
ShreelekhyaG commented on a change in pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#discussion_r478240384 ## File path: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala ## @@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with BeforeAndAfterAll { sql("RESET") } + test("TC_014-test mv after reset properties") { +sql("drop table if exists maintable") +sql("drop MATERIALIZED VIEW if exists mv1") +sql("CREATE TABLE maintable(empno int,empname string,projectcode int, projectjoindate " + +"Timestamp, projectenddate date,salary double) STORED AS carbondata") +sql("CREATE MATERIALIZED VIEW mv1 as select timeseries(projectenddate,'day'), sum" + +"(projectcode) from maintable group by timeseries(projectenddate,'day')") +sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 12:07:28','2016-05-20'," + +"15000.00") +sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 17:32:20','2016-05-20'," + +"25000.00") +sql("set carbon.input.segments.defualt.maintable=1") +checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), sum(projectcode) from " + Review comment: ok added check with `verifyMVHit`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-681759193 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3884/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-681757236 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2143/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478227905 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java ## @@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder, return (Object[]) input[i]; } + public static List extractFilesFromFolder(String path, + String suf, Configuration hadoopConf) { +List dataFiles = listFiles(path, suf, hadoopConf); +List carbonFiles = new ArrayList<>(); +for (Object dataFile: dataFiles) { + carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("No file found at given location. Please provide" + + "the correct folder location."); +} +return carbonFiles; + } + + public static DataFileStream buildAvroReader(CarbonFile carbonFile, + Configuration configuration) throws IOException { +try { + GenericDatumReader genericDatumReader = + new GenericDatumReader<>(); + DataFileStream avroReader = + new DataFileStream<>(FileFactory.getDataInputStream(carbonFile.getPath(), + -1, configuration), genericDatumReader); + return avroReader; +} catch (FileNotFoundException ex) { + throw new FileNotFoundException("File " + carbonFile.getPath() + + " not found to build carbon writer."); +} catch (IOException ex) { + if (ex.getMessage().contains("Not a data file")) { Review comment: Why catch `IOException` and rethrow as `RuntimeException` here ? You converted checked exception to uncheked exception and consumed original exception completely. Better to preserve the orignal exception as there is no action after catching it except for the different exception message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478227936 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,52 @@ protected String debugInfo() { return this.toString(); } + // Utility class to update current vector to child vector in case of complex type handling + public static class VectorUtil { +private ColumnVectorInfo vectorInfo; +private int pageSize; +private CarbonColumnVector vector; +private DataType vectorDataType; + +public VectorUtil(ColumnVectorInfo vectorInfo, int pageSize, CarbonColumnVector vector, +DataType vectorDataType) { + this.vectorInfo = vectorInfo; + this.pageSize = pageSize; + this.vector = vector; + this.vectorDataType = vectorDataType; +} + +public int getPageSize() { + return pageSize; +} + +public CarbonColumnVector getVector() { + return vector; +} + +public DataType getVectorDataType() { + return vectorDataType; +} + +public VectorUtil checkAndUpdateToChildVector() { Review comment: removed this whole class itself and updating vector inside **ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r478227589 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { + if (DataTypes.isArrayType(vectorDataType)) { +List childElementsCountForEachRow = +((CarbonColumnVectorImpl) vector.getColumnVector()) +.getNumberOfChildrenElementsInEachRow(); +int newPageSize = 0; +for (int childElementsCount : childElementsCountForEachRow) { + newPageSize += childElementsCount; +} +vectorUtil.setPageSize(newPageSize); + } + // child vector flow, so fill the child vector + CarbonColumnVector childVector = vectorStack.pop(); + vectorUtil.setVector(childVector); + vectorUtil.setVectorDataType(childVector.getType()); +} +return vectorUtil; + } + + // Utility class to update current vector to child vector in case of complex type handling + public static class VectorUtil { +private int pageSize; +private CarbonColumnVector vector; +private DataType vectorDataType; + +private VectorUtil(int pageSize, CarbonColumnVector vector, DataType vectorDataType) { + this.pageSize = pageSize; + this.vector = vector; + this.vectorDataType = vectorDataType; +} + +public int getPageSize() { Review comment: removed this whole class itself and updating vector inside **ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory** ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,52 @@ protected String debugInfo() { return this.toString(); } + // Utility class to update current vector to child vector in case of complex type handling + public static class VectorUtil { Review comment: removed this whole class itself and updating vector inside **ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3955) Fix load failures due to daylight saving time changes
[ https://issues.apache.org/jira/browse/CARBONDATA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3955. - Fix Version/s: 2.1.0 Resolution: Fixed > Fix load failures due to daylight saving time changes > - > > Key: CARBONDATA-3955 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3955 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Fix For: 2.1.0 > > Time Spent: 10h 20m > Remaining Estimate: 0h > > 1) Fix load failures due to daylight saving time changes. > 2) During load, date/timestamp year values with >4 digit should fail or be > null according to bad records action property. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
asfgit closed pull request #3896: URL: https://github.com/apache/carbondata/pull/3896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org