[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475378412 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -816,10 +816,13 @@ object CarbonDataRDDFactory { val partitionByRdd = keyRDD.partitionBy( new SegmentPartitioner(segmentIdIndex, segmentUpdateParallelism)) + val carbonSessionInfoBroadcast = sqlContext.sparkSession.sparkContext Review comment: In case of normal load flow, NewCarbonDataLoadRDD extends carbonRDD. While initializing, we get carbonSessionInfo from current thread and in compute of carbonRDD we set to the thread by `ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)`. We can either do the same here or broadcast. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377552 ## File path: integration/spark/src/test/resources/badrecords/invalidTimeStampRange.csv ## @@ -0,0 +1,2 @@ +ID,date,starttime,country,name,phonetype,serialname,salary Review comment: ok, made changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377258 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377325 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +long minValue = DateDirectDictionaryGenerator.MIN_VALUE; +long maxValue = DateDirectDictionaryGenerator.MAX_VALUE; +if (timeValue < minValue || timeValue > maxValue) { + if (LOGGER.isDebugEnabled()) { +LOGGER.debug("Value for timestamp type column is not in valid range."); + } + throw new NumberFormatException("Value for timestamp type column is not in valid range."); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377187 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +long minValue = DateDirectDictionaryGenerator.MIN_VALUE; +long maxValue = DateDirectDictionaryGenerator.MAX_VALUE; +if (timeValue < minValue || timeValue > maxValue) { + if (LOGGER.isDebugEnabled()) { Review comment: ok, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
akkio-97 commented on a change in pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#discussion_r475362156 ## File path: docs/hive-guide.md ## @@ -52,16 +52,11 @@ $HADOOP_HOME/bin/hadoop fs -put sample.csv /sample.csv ``` import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ -val rootPath = "hdfs:///user/hadoop/carbon" -val storeLocation = s"$rootPath/store" -val warehouse = s"$rootPath/warehouse" -val metaStoreDB = s"$rootPath/metastore_db" - -val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metaStoreDB) - -carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata") -carbon.sql("LOAD DATA INPATH '/sample.csv' INTO TABLE hive_carbon") -scala>carbon.sql("SELECT * FROM hive_carbon").show() +val newSpark = SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate() +newSpark.sql("drop table if exists hive_carbon").show +newSpark.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata").show +newSpark.sql("/sample.csv INTO TABLE hive_carbon").show Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3893: Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if executor LRU c
Indhumathi27 commented on pull request #3893: URL: https://github.com/apache/carbondata/pull/3893#issuecomment-678918781 @Karan980 Please create JIRA and update the PR description This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
akashrn5 commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475353691 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: yeah, i tried that, but since im using MockUp[T], it needs actual scala class name, not the object, as MockUp identifies class, so i followed this similarly by referring the `TableStatusBackupTest` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
QiangCai commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475347003 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: for scala, The class is CarbonSessionCatalogUtil$. The method name is updateCatalogTableForAlter. It is private, but not a static method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475346002 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1592,6 +1592,13 @@ private CarbonCommonConstants() { public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false"; + // Property to enable parsing the data with setLenient = true in load flow if it fails with Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678905253 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
akashrn5 commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475336910 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: i checked for static methods, but all were defined in scala object not class, so i thought i can do this way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
akashrn5 commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475336497 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonSessionCatalogUtil.scala ## @@ -61,21 +61,6 @@ object CarbonSessionCatalogUtil { s"'dbName'='${ newTableIdentifier.database.get }', 'tablePath'='${ newTablePath }')") } - /** - * Below method will be used to update serde properties - * @param tableIdentifier table identifier - * @param schemaParts schema parts - * @param cols cols - */ - def alterTable(tableIdentifier: TableIdentifier, - schemaParts: String, - cols: Option[Seq[ColumnSchema]], - sparkSession: SparkSession): Unit = { -getClient(sparkSession) - .runSqlHive(s"ALTER TABLE `${tableIdentifier.database.get}`.`${tableIdentifier.table}` " + - s"SET TBLPROPERTIES($schemaParts)") Review comment: yes, we are calling the API `org.apache.spark.sql.hive.CarbonSessionUtil#alterExternalCatalogForTableWithUpdatedSchema` which does for us. All the alter tests are passing if you see. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
QiangCai commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475301804 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: can we mock up the static method directly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
QiangCai commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475298980 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonSessionCatalogUtil.scala ## @@ -61,21 +61,6 @@ object CarbonSessionCatalogUtil { s"'dbName'='${ newTableIdentifier.database.get }', 'tablePath'='${ newTablePath }')") } - /** - * Below method will be used to update serde properties - * @param tableIdentifier table identifier - * @param schemaParts schema parts - * @param cols cols - */ - def alterTable(tableIdentifier: TableIdentifier, - schemaParts: String, - cols: Option[Seq[ColumnSchema]], - sparkSession: SparkSession): Unit = { -getClient(sparkSession) - .runSqlHive(s"ALTER TABLE `${tableIdentifier.database.get}`.`${tableIdentifier.table}` " + - s"SET TBLPROPERTIES($schemaParts)") Review comment: if we do not change it, will spark/hive update it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678825751 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2101/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678824999 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3842/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
CarbonDataQA1 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-678814227 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2100/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
CarbonDataQA1 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-678813779 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3841/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678812610 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678812332 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2099/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678811731 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3840/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-678811108 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
kunal642 commented on a change in pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#discussion_r475244586 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonIUD.java ## @@ -0,0 +1,275 @@ +package org.apache.carbondata.sdk.file; + +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.Field; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.conditional.EqualToExpression; +import org.apache.carbondata.core.scan.expression.logical.AndExpression; +import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat; +import org.apache.carbondata.hadoop.internal.ObjectArrayWritable; + +import java.io.File; +import java.io.FilenameFilter; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import com.jcraft.jsch.IO; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.RecordWriter; +import org.apache.hadoop.mapreduce.TaskAttemptID; +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl; + +public class CarbonIUD { + + private final Map> filterColumnToValueMapping; + private final Map> updateColumnToValueMapping; + + private CarbonIUD() { +filterColumnToValueMapping = new HashMap<>(); +updateColumnToValueMapping = new HashMap<>(); + } + + /** + * @return CarbonIUD object + */ + public static CarbonIUD getInstance() { +return new CarbonIUD(); + } + + /** + * @param path is the segment path on which delete is performed + * @param column is the columeName on which records have to be deleted + * @param value of column on which the records have to be deleted + * @return CarbonIUD object + * + * for eg: DELETE WHERE column = value + */ + public CarbonIUD delete(String path, String column, String value) { +prepareDelete(path, column, value); +return this; + } + + /** + * @param path is the segment path on which delete is performed + * @param filterExpression is the expression to delete the records + * @throws IOException + * @throws InterruptedException + */ + public void delete(String path, Expression filterExpression) + throws IOException, InterruptedException { +CarbonReader reader = CarbonReader.builder(path, "_temp") +.projection(new String[] { CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID }) +.filter(filterExpression).build(); + +RecordWriter deleteDeltaWriter = +CarbonTableOutputFormat.getDeleteDeltaRecordWriter(path); +ObjectArrayWritable writable = new ObjectArrayWritable(); +while (reader.hasNext()) { + Object[] row = (Object[]) reader.readNextRow(); + writable.set(row); + deleteDeltaWriter.write(NullWritable.get(), writable); +} +deleteDeltaWriter.close(null); +reader.close(); + } + + /** + * Calling this method will start the execution of delete process + * @throws IOException + * @throws InterruptedException + */ + public void closeDelete() throws IOException, InterruptedException { +for (Map.Entry> path : this.filterColumnToValueMapping.entrySet()) { + deleteExecution(path.getKey()); +} +this.filterColumnToValueMapping.clear(); + } + + /** + * @param path is the segment path on which update is performed + * @param columnis the columeName on which records have to be updated + * @param value of column on which the records have to be updated + * @param updColumn is the name of updatedColumn + * @param updValue is the value of updatedColumn + * @return CarbonUID + * + * for eg: UPDATE updColumn = updValue WHERE column = value + */ + public CarbonIUD update(String path, String column, String value, String updColumn, + String updValue) { +prepareUpdate(path, column, value, updColumn, updValue); +return this; + } + + /** + * @param pathis the segment path on which update is performed. + * @param filterExpressionis the expression object to update the records + * @param updatedColumnToValueMapping contains the mapping of updatedColumns to updatedValues + * @throws IOException + * @throws InterruptedException + * @throws InvalidLoadOptionException + */ + public void update(String path, Expression filterExpression, + Map updatedColumnToValueMapping) + throws IOException, In
[GitHub] [carbondata] kunal642 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
kunal642 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-678801642 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3946) Support IndexServer with Presto Engine
[ https://issues.apache.org/jira/browse/CARBONDATA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3946. -- Fix Version/s: 2.1.0 Resolution: Fixed > Support IndexServer with Presto Engine > -- > > Key: CARBONDATA-3946 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3946 > Project: CarbonData > Issue Type: New Feature >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.1.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
asfgit closed pull request #3885: URL: https://github.com/apache/carbondata/pull/3885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
kunal642 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-678801176 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678799256 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-678798742 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2097/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-678798596 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3838/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
kunal642 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-678782325 please rebase This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org