[GitHub] [carbondata] vikramahuja1001 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
vikramahuja1001 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675868683 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-675868405 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3782/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-675867872 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2040/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675858348 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3925) flink-integration write carbon file to hdfs error
[ https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yutao updated CARBONDATA-3925: -- Fix Version/s: 2.1.0 Issue Type: Bug (was: Improvement) Priority: Major (was: Minor) Summary: flink-integration write carbon file to hdfs error (was: flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname) > flink-integration write carbon file to hdfs error > - > > Key: CARBONDATA-3925 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3925 > Project: CarbonData > Issue Type: Bug > Components: flink-integration >Affects Versions: 2.0.0 >Reporter: yutao >Priority: Major > Fix For: 2.1.0, 2.0.1 > > > in CarbonWriter.java code ,you can find this; > public abstract class *{color:red}CarbonWriter{color}* extends > ProxyFileWriter { > private static final Logger LOGGER = > > LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());} > always wo can find logfile print like ; > 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer > this is puzzled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] yutaoChina commented on a change in pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write
yutaoChina commented on a change in pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#discussion_r472681680 ## File path: integration/flink/src/main/java/org/apache/carbon/core/metadata/StageManager.java ## @@ -81,7 +81,7 @@ public static void writeStageInput(final String stageInputPath, final StageInput private static void writeSuccessFile(final String successFilePath) throws IOException { final DataOutputStream segmentStatusSuccessOutputStream = FileFactory.getDataOutputStream(successFilePath, -CarbonCommonConstants.BYTEBUFFER_SIZE, 1024); +CarbonCommonConstants.BYTEBUFFER_SIZE, 1024 * 1024 * 2); Review comment: i set it 2M beacuase hdfs (dfs.namenode.fs-limits.min-block-size) configured minimum value size is 1M and in CarbonUtil.java class `getMaxOfBlockAndFileSize(long blockSize, long fileSize) `method use `long maxSize = blockSize; if (fileSize > blockSize) { maxSize = fileSize; }` if default size or filesize less than 1M program will get error ; why 2M ? default is 1M so default * 2 bigger than it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3927) TupleID/Position reference is long , make it short
[ https://issues.apache.org/jira/browse/CARBONDATA-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3927. -- Fix Version/s: 2.1.0 Resolution: Fixed > TupleID/Position reference is long , make it short > -- > > Key: CARBONDATA-3927 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3927 > Project: CarbonData > Issue Type: Improvement >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > the current tuple id is long where some parts we can avoid to improve > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3837: [CARBONDATA-3927]Remove unwanted fields from tupleID to make it short and to improve store size and performance.
asfgit closed pull request #3837: URL: https://github.com/apache/carbondata/pull/3837 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3837: [CARBONDATA-3927]Remove unwanted fields from tupleID to make it short and to improve store size and performance.
kunal642 commented on pull request #3837: URL: https://github.com/apache/carbondata/pull/3837#issuecomment-675843217 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3863) index service go back to emmbedded mode
[ https://issues.apache.org/jira/browse/CARBONDATA-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3863. -- Fix Version/s: 2.1.0 Resolution: Fixed > index service go back to emmbedded mode > --- > > Key: CARBONDATA-3863 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3863 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Taoli >Priority: Major > Fix For: 2.1.0 > > Time Spent: 6h > Remaining Estimate: 0h > > when use index service,some way may cause the floder "/tmp/indexservertmp" > get max-directory-item exception. in that case the index service go back to > emmbedded mode. > the error is like above: > > Exception occured: The directory item limit of /tmp/indexservertmp is > exceeded: limit=1048576 > items=1048576. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
asfgit closed pull request #3855: URL: https://github.com/apache/carbondata/pull/3855 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
kunal642 commented on pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#issuecomment-675841436 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675783198 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2038/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675778572 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3780/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675713570 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2037/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675690653 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2034/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [WIP] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-675687794 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3777/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675687528 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3776/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [WIP] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-675687520 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2035/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675662000 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3779/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472408878 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java ## @@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder, return (Object[]) input[i]; } + public static List extractFilesFromFolder(String path, + String suf, Configuration hadoopConf) { +List dataFiles = listFiles(path, suf, hadoopConf); +List carbonFiles = new ArrayList<>(); +for (Object dataFile: dataFiles) { + carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("No file found at given location. Please provide" + + "the correct folder location."); +} +return carbonFiles; + } + + public static DataFileStream buildAvroReader(CarbonFile carbonFile, + Configuration configuration) throws IOException { +try { + GenericDatumReader genericDatumReader = + new GenericDatumReader<>(); + DataFileStream avroReader = + new DataFileStream<>(FileFactory.getDataInputStream(carbonFile.getPath(), + -1, configuration), genericDatumReader); + return avroReader; +} catch (FileNotFoundException ex) { + throw new FileNotFoundException("File " + carbonFile.getPath() + + " not found to build carbon writer."); +} catch (IOException ex) { + if (ex.getMessage().contains("Not a data file")) { +throw new RuntimeException("File " + carbonFile.getPath() + " is not in avro format."); + } else { +throw ex; + } +} + } + + public static Reader buildOrcReader(String path, Configuration conf) throws IOException { +try { + Reader orcReader = OrcFile.createReader(new Path(path), + OrcFile.readerOptions(conf)); + return orcReader; +} catch (FileFormatException ex) { + throw new RuntimeException("File " + path + " is not in ORC format"); +} catch (FileNotFoundException ex) { + throw new FileNotFoundException("File " + path + " not found to build carbon writer."); +} + } + + public static ParquetReader buildPqrquetReader(String path, Configuration conf) Review comment: Please correct spelling mistake for parquet in method name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472407770 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); Review comment: Had checked the base code. In the base code, we seem to directly assign the return value of FileFactory.getConfiguration() instead of new Configuration. Suggest to check and keep it consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on to
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472388572 ## File path: examples/spark/pom.xml ## @@ -38,6 +38,12 @@ org.apache.carbondata carbondata-spark_${spark.binary.version} ${project.version} + Review comment: Was wondering why this exclusion in examples/spark/pom.xml & integration/spark/pom.xml . You don't seem to have any change in these 2 modules. I think, you want to exclude elsewhere ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-675631086 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2033/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472382210 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " + + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +SimpleDateFormat df = new SimpleDateFormat("-MM-dd HH:mm:ss"); Review comment: rechecked and made use of existing value from `DateDirectDictionaryGenerator.MIN_VALUE` and `DateDirectDictionaryGenerator.MAX_VALUE` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472381642 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " Review comment: agree. removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472381408 ## File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java ## @@ -153,6 +154,12 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu case ENABLE_UNSAFE_IN_QUERY_EXECUTION: case ENABLE_AUTO_LOAD_MERGE: case CARBON_PUSH_ROW_FILTERS_FOR_VECTOR: + case CARBON_LOAD_SETLENIENT_ENABLE: Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472380864 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " + + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { Review comment: ok added ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,6 +307,39 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with daylight saving time from different timezone") { +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true") +val defaultTimeZone = TimeZone.getDefault +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS t3") +sql( + """ + CREATE TABLE IF NOT EXISTS t3 + (ID Int, date date, starttime Timestamp, country String, + name String, phonetype String, serialname String, salary Int) + STORED AS carbondata TBLPROPERTIES('dateformat'='/MM/dd', + 'timestampformat'='-MM-dd HH:mm') +""") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/timeStampFormatData3.csv' into table t3") +sql(s"insert into t3 select 11,'2015-7-23','1941-3-15 00:00:00','china','aaa1','phone197'," + +s"'ASD69643',15000") +sql("update t3 set (starttime) = ('1941-3-15 00:00:00') where name='aaa2'") +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 1"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 11"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 2"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +TimeZone.setDefault(defaultTimeZone) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472379038 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2456,4 +2471,24 @@ private CarbonCommonConstants() { * property which defines the insert stage flow */ public static final String IS_INSERT_STAGE = "is_insert_stage"; + + /** + * the level 1 complex delimiter default value + */ + @CarbonProperty Review comment: This looks to be just a value. Not the user configuration property. If so, @CarbonProperty is not required. please check and remove. check the same for below 2 more properties as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472379038 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2456,4 +2471,24 @@ private CarbonCommonConstants() { * property which defines the insert stage flow */ public static final String IS_INSERT_STAGE = "is_insert_stage"; + + /** + * the level 1 complex delimiter default value + */ + @CarbonProperty Review comment: This looks to be just a value. Not the user configuration property. If so, @CarbonProperty is not required. please check and remove This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-675621560 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3775/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675619431 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
CarbonDataQA1 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-675594693 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3773/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675591167 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3774/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675590164 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2031/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472332314 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " + + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { Review comment: `validateTimeStampRange()` throws `NumberFormatException`. Your would want to do `dateFormatter.setLenient(false);` in that case too.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
CarbonDataQA1 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-675586969 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2032/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472327182 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " Review comment: `Changing setLenient to true for TimeStamp: " + dimensionValue ` is redundant. we have already logged it in line 452. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675571056 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3771/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675569959 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2029/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [WIP] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-675563624 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3770/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472300831 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util + +import java.io.{File, FileFilter} + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.commons.io.FileUtils + +object BadRecordUtil { + + /** + * get the bad record redirected csv file path + * @param dbName Review comment: done ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util + +import java.io.{File, FileFilter} + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.commons.io.FileUtils + +object BadRecordUtil { + + /** + * get the bad record redirected csv file path + * @param dbName + * @param tableName + * @param segment + * @param task + * @return csv File + */ + def getRedirectCsvPath(dbName: String, +tableName: String, segment: String, task: String): File = { +var badRecordLocation = CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC) +badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName + "/" + segment + "/" + + task +val listFiles = new File(badRecordLocation).listFiles(new FileFilter { + override def accept(pathname: File): Boolean = { +pathname.getPath.endsWith(".csv") + } +}) +listFiles(0) + } + + /** + * compare data of csvfile and redirected csv file. + * @param csvFilePath csv file path Review comment: done ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util + +import java.io.{File, FileFilter} + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.commons.io.FileUtils + +object BadRecordUtil { + + /** + * get the
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
akashrn5 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472296594 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util + +import java.io.{File, FileFilter} + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.commons.io.FileUtils + +object BadRecordUtil { + + /** + * get the bad record redirected csv file path + * @param dbName Review comment: remove these @param, not required ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util + +import java.io.{File, FileFilter} + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.commons.io.FileUtils + +object BadRecordUtil { + + /** + * get the bad record redirected csv file path + * @param dbName + * @param tableName + * @param segment + * @param task + * @return csv File + */ + def getRedirectCsvPath(dbName: String, +tableName: String, segment: String, task: String): File = { +var badRecordLocation = CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC) +badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName + "/" + segment + "/" + + task +val listFiles = new File(badRecordLocation).listFiles(new FileFilter { + override def accept(pathname: File): Boolean = { +pathname.getPath.endsWith(".csv") + } +}) +listFiles(0) + } + + /** + * compare data of csvfile and redirected csv file. + * @param csvFilePath csv file path + * @param redirectCsvPath redirected csv file path + * @return boolean + */ + def checkRedirectedCsvContentAvailableInSource(csvFilePath: String, +redirectCsvPath: File): Boolean = { +val origFileLineList = FileUtils.readLines(new File(csvFilePath)) +val redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +val iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + if (!origFileLineList.contains(iterator.next())) { +return false; + } +} +true + } + + /** + * delete the files at bad record location + * @param dbName database name + * @param tableName table name + * @return boolean + */ Review comment: same as above ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472294029 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " + + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +SimpleDateFormat df = new SimpleDateFormat("-MM-dd HH:mm:ss"); Review comment: Here, the `DateDirectDictionaryGenerator.MIN_VALUE` is ("0001-01-01") which is not equals to timestamp minvalue ("0001-01-01 00:00:00"). As the format is different, will get different long values after parse. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472288683 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,6 +307,39 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with daylight saving time from different timezone") { +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true") +val defaultTimeZone = TimeZone.getDefault +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS t3") +sql( + """ + CREATE TABLE IF NOT EXISTS t3 + (ID Int, date date, starttime Timestamp, country String, + name String, phonetype String, serialname String, salary Int) + STORED AS carbondata TBLPROPERTIES('dateformat'='/MM/dd', + 'timestampformat'='-MM-dd HH:mm') +""") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/timeStampFormatData3.csv' into table t3") +sql(s"insert into t3 select 11,'2015-7-23','1941-3-15 00:00:00','china','aaa1','phone197'," + +s"'ASD69643',15000") +sql("update t3 set (starttime) = ('1941-3-15 00:00:00') where name='aaa2'") +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 1"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 11"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT starttime FROM t3 WHERE ID = 2"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +TimeZone.setDefault(defaultTimeZone) Review comment: Remove `CARBON_LOAD_SETLENIENT_ENABLE` from carbon properies at the end of testcase. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472285299 ## File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java ## @@ -153,6 +154,12 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu case ENABLE_UNSAFE_IN_QUERY_EXECUTION: case ENABLE_AUTO_LOAD_MERGE: case CARBON_PUSH_ROW_FILTERS_FOR_VECTOR: + case CARBON_LOAD_SETLENIENT_ENABLE: Review comment: It can be a fall through case. Can remove line 158-162` ## File path: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java ## @@ -153,6 +154,12 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu case ENABLE_UNSAFE_IN_QUERY_EXECUTION: case ENABLE_AUTO_LOAD_MERGE: case CARBON_PUSH_ROW_FILTERS_FOR_VECTOR: + case CARBON_LOAD_SETLENIENT_ENABLE: Review comment: It can be a fall through case. Can remove line 158-162 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [WIP] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-675546141 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2028/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [WIP] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r472278473 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -434,20 +434,59 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } private static Object parseTimestamp(String dimensionValue, String dateFormat) { -Date dateToStr; -DateFormat dateFormatter; +Date dateToStr = null; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info( + "Changing setLenient to true for TimeStamp: " + dimensionValue + ". Changing " + + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +SimpleDateFormat df = new SimpleDateFormat("-MM-dd HH:mm:ss"); Review comment: Instead of creating instance of simpleDateFormat each time, suggest to use existing `DateDirectDictionaryGenerator.MIN_VALUE` and `DateDirectDictionaryGenerator.MAX_VALUE` to validate This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
CarbonDataQA1 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675526677 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2027/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
CarbonDataQA1 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675516186 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3769/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-675511761 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3768/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-675511043 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2026/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3943) Handling the addition of geo column to hive at the time of table creation
[ https://issues.apache.org/jira/browse/CARBONDATA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3943. - Fix Version/s: 2.1.0 Resolution: Fixed > Handling the addition of geo column to hive at the time of table creation > -- > > Key: CARBONDATA-3943 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3943 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Handling the addition of geo column to hive at the time of table creation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
asfgit closed pull request #3879: URL: https://github.com/apache/carbondata/pull/3879 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
akashrn5 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675494423 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675488619 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3764/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675487964 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2025/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675486176 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3765/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
xubo245 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-675484851 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675479510 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2024/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3896: [WIP] Fix load failures due to daylight saving time changes
ShreelekhyaG opened a new pull request #3896: URL: https://github.com/apache/carbondata/pull/3896 ### Why is this PR needed? 1. Fix load failures due to daylight saving time changes. 2. During load, date/timestamp year values with >4 digit should fail or be null according to bad records action property. ### What changes were proposed in this PR? New property added to setLeniet as true and parse timestampformat. Added validation for timestamp range values. ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
CarbonDataQA1 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675439020 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2023/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
CarbonDataQA1 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675437596 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3763/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472122694 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +150,162 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +val testdata =s"$resourcesPath/MoreThan32KChar.csv" +FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC))) +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='REDIRECT','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") +var redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +val longChar: String = RandomStringUtils.randomAlphabetic(33000) + CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") -val testdata =s"$resourcesPath/32000char.csv" -sql("drop table if exists load32000chardata") -sql("drop table if exists load32000chardata_dup") -sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata OPTIONS('FILEHEADER'='dim1,dim2,mes1')") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception intercept[Exception] { Review comment: Added at some place but here exception message is not user formatted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472122959 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +150,162 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +val testdata =s"$resourcesPath/MoreThan32KChar.csv" +FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC))) +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='REDIRECT','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") +var redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +val longChar: String = RandomStringUtils.randomAlphabetic(33000) + CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") -val testdata =s"$resourcesPath/32000char.csv" -sql("drop table if exists load32000chardata") -sql("drop table if exists load32000chardata_dup") -sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata OPTIONS('FILEHEADER'='dim1,dim2,mes1')") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception intercept[Exception] { - sql("insert into load32000chardata_dup select dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show() + sql(s"insert into longerthan32kchar values('32000', '$longChar', 3)") } -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata_dup OPTIONS('FILEHEADER'='dim1,dim2,mes1')") + +FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC))) + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Force") { +val testdata =s"$resourcesPath/MoreThan32KChar.csv" +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2), Row("32123", null, 3))) +val longChar: String = RandomStringUtils.randomAlphabetic(33000) + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FORCE"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), + Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2),
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472122318 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/test/util/QueryTest.scala ## @@ -207,6 +208,45 @@ class QueryTest extends PlanTest { } } } + + def getRedirectCsvPath(dbName: String, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r472122106 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +150,162 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +val testdata =s"$resourcesPath/MoreThan32KChar.csv" +FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC))) +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='REDIRECT','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") +var redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +val longChar: String = RandomStringUtils.randomAlphabetic(33000) + CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") -val testdata =s"$resourcesPath/32000char.csv" -sql("drop table if exists load32000chardata") -sql("drop table if exists load32000chardata_dup") -sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata OPTIONS('FILEHEADER'='dim1,dim2,mes1')") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception intercept[Exception] { - sql("insert into load32000chardata_dup select dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show() + sql(s"insert into longerthan32kchar values('32000', '$longChar', 3)") } -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata_dup OPTIONS('FILEHEADER'='dim1,dim2,mes1')") + +FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC))) + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Force") { +val testdata =s"$resourcesPath/MoreThan32KChar.csv" +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2), Row("32123", null, 3))) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
akashrn5 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675417600 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-675417286 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3895: [WIP]SI fix for not equal to filter
CarbonDataQA1 commented on pull request #3895: URL: https://github.com/apache/carbondata/pull/3895#issuecomment-675415732 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3760/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3895: [WIP]SI fix for not equal to filter
CarbonDataQA1 commented on pull request #3895: URL: https://github.com/apache/carbondata/pull/3895#issuecomment-675397590 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2019/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675395912 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3759/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
vikramahuja1001 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675389481 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-675386736 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3758/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675382970 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3762/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675382272 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2022/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675379676 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2018/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-675377981 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2017/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3954) Global sorting with array, if read from ORC format, write to carbon, error; If you use no_sort, success;
[ https://issues.apache.org/jira/browse/CARBONDATA-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaohui updated CARBONDATA-3954: Attachment: wx20200818-174...@2x.png wx20200818-174...@2x.png > Global sorting with array, if read from ORC format, write to carbon, error; > If you use no_sort, success; > > > Key: CARBONDATA-3954 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3954 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: xiaohui >Priority: Major > Attachments: wx20200818-174...@2x.png, wx20200818-174...@2x.png > > > 0: jdbc:hive2://localhost:1> use dict; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.391 seconds) > 0: jdbc:hive2://localhost:1> select * from array_orc; > ++---+--+--+ > | name | col | fee | > ++---+--+--+ > | xiao3 | ["",null,"j"] | 3| > | xiao2 | ["上呼吸道疾病1","白内障1","胃溃疡1"] | 2| > | xiao3 | ["",null,"j"] | 3| > | xiao1 | ["上呼吸道疾病","白内障","胃溃疡"]| 1| > | xiao9 | NULL | 3| > | xiao9 | NULL | 3| > | xiao3 | NULL | 3| > | xiao6 | NULL | 3| > | xiao2 | ["上呼吸道疾病 1","白内障 1","胃溃疡 1"] | 2| > | xiao1 | ["上呼吸道疾病 ","白内障 ","胃溃疡 "] | 1| > | xiao3 | NULL | 3| > | xiao3 | [null]| 3| > | xiao3 | [""] | 3| > ++---+--+--+ > 13 rows selected (0.416 seconds) > 0: jdbc:hive2://localhost:1> create table array_carbon4(name string, col > array,fee int) STORED AS carbondata TBLPROPERTIES > ('SORT_COLUMNS'='name', > 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', > 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', > 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='no_SORT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (1.04 seconds) > 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon4 select > name,col,fee from array_orc; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (5.065 seconds) > 0: jdbc:hive2://localhost:1> create table array_carbon5(name string, col > array,fee int) STORED AS carbondata TBLPROPERTIES > ('SORT_COLUMNS'='name', > 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', > 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', > 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='global_SORT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.098 seconds) > 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon5 select > name,col,fee from array_orc; > Error: java.lang.Exception: DataLoad failure (state=,code=0) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3954) Global sorting with array, if read from ORC format, write to carbon, error; If you use no_sort, success;
[ https://issues.apache.org/jira/browse/CARBONDATA-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaohui updated CARBONDATA-3954: Description: 0: jdbc:hive2://localhost:1> use dict; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.391 seconds) 0: jdbc:hive2://localhost:1> select * from array_orc; ++---+--+--+ | name | col | fee | ++---+--+--+ | xiao3 | ["",null,"j"] | 3| | xiao2 | ["上呼吸道疾病1","白内障1","胃溃疡1"] | 2| | xiao3 | ["",null,"j"] | 3| | xiao1 | ["上呼吸道疾病","白内障","胃溃疡"]| 1| | xiao9 | NULL | 3| | xiao9 | NULL | 3| | xiao3 | NULL | 3| | xiao6 | NULL | 3| | xiao2 | ["上呼吸道疾病 1","白内障 1","胃溃疡 1"] | 2| | xiao1 | ["上呼吸道疾病 ","白内障 ","胃溃疡 "] | 1| | xiao3 | NULL | 3| | xiao3 | [null]| 3| | xiao3 | [""] | 3| ++---+--+--+ 13 rows selected (0.416 seconds) 0: jdbc:hive2://localhost:1> create table array_carbon4(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='no_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.04 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon4 select name,col,fee from array_orc; +-+--+ | Result | +-+--+ +-+--+ No rows selected (5.065 seconds) 0: jdbc:hive2://localhost:1> create table array_carbon5(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='global_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.098 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon5 select name,col,fee from array_orc; Error: java.lang.Exception: DataLoad failure (state=,code=0) was: 0: jdbc:hive2://localhost:1> use dict; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.391 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon3 select name,col,fee from array_orc; Error: java.lang.Exception: DataLoad failure (state=,code=0) 0: jdbc:hive2://localhost:1> create table array_carbon4(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='no_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.04 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon4 select name,col,fee from array_orc; +-+--+ | Result | +-+--+ +-+--+ No rows selected (5.065 seconds) 0: jdbc:hive2://localhost:1> create table array_carbon5(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='global_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.098 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon5 select name,col,fee from array_orc; Error: java.lang.Exception: DataLoad failure (state=,code=0) 0: jdbc:hive2://localhost:1> select * from array_orc; ++---+--+--+ | name | col | fee | ++---+--+--+ | xiao3 | ["",null,"j"] | 3| | xiao2 | ["上呼吸道疾病1","白内障1","胃溃疡1"] | 2| | xiao3 | ["",null,"j"] | 3| | xiao1 | ["上呼吸道疾病","白内障","胃溃疡"]| 1| | xiao9 | NULL | 3| | xiao9 | NULL | 3| | xiao3 | NULL | 3| | xiao6 | NULL | 3| | xiao2 | ["上呼吸道疾病 1","白内障 1","胃溃疡 1"] | 2| | xiao1 | ["上呼吸道疾病 ","白内障 ","胃溃疡 "] | 1| | xiao3 | NULL | 3| | xiao3 | [null]| 3| | xiao3 | [""] | 3| ++---+--+--+ 13 rows selected (0.416 seconds) > Global
[jira] [Created] (CARBONDATA-3954) Global sorting with array, if read from ORC format, write to carbon, error; If you use no_sort, success;
xiaohui created CARBONDATA-3954: --- Summary: Global sorting with array, if read from ORC format, write to carbon, error; If you use no_sort, success; Key: CARBONDATA-3954 URL: https://issues.apache.org/jira/browse/CARBONDATA-3954 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.0 Reporter: xiaohui 0: jdbc:hive2://localhost:1> use dict; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.391 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon3 select name,col,fee from array_orc; Error: java.lang.Exception: DataLoad failure (state=,code=0) 0: jdbc:hive2://localhost:1> create table array_carbon4(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='no_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.04 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon4 select name,col,fee from array_orc; +-+--+ | Result | +-+--+ +-+--+ No rows selected (5.065 seconds) 0: jdbc:hive2://localhost:1> create table array_carbon5(name string, col array,fee int) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='name', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128', 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128', 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='global_SORT'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.098 seconds) 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon5 select name,col,fee from array_orc; Error: java.lang.Exception: DataLoad failure (state=,code=0) 0: jdbc:hive2://localhost:1> select * from array_orc; ++---+--+--+ | name | col | fee | ++---+--+--+ | xiao3 | ["",null,"j"] | 3| | xiao2 | ["上呼吸道疾病1","白内障1","胃溃疡1"] | 2| | xiao3 | ["",null,"j"] | 3| | xiao1 | ["上呼吸道疾病","白内障","胃溃疡"]| 1| | xiao9 | NULL | 3| | xiao9 | NULL | 3| | xiao3 | NULL | 3| | xiao6 | NULL | 3| | xiao2 | ["上呼吸道疾病 1","白内障 1","胃溃疡 1"] | 2| | xiao1 | ["上呼吸道疾病 ","白内障 ","胃溃疡 "] | 1| | xiao3 | NULL | 3| | xiao3 | [null]| 3| | xiao3 | [""] | 3| ++---+--+--+ 13 rows selected (0.416 seconds) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675366072 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3761/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 closed pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 closed pull request #3773: URL: https://github.com/apache/carbondata/pull/3773 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-675365410 okay, thanks for all your suggestions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-675365569 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2021/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r472038054 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java ## @@ -23,6 +23,7 @@ import java.util.BitSet; import java.util.List; import java.util.Map; +import java.util.Stack; Review comment: okay ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; Review comment: okay ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; + +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.util.ByteUtil; + +public class FillVector { + private byte[] pageData; + private float floatFactor = 0; + private double factor = 0; + private ColumnVectorInfo vectorInfo; + private BitSet nullBits; + + public FillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet nullBits) { +this.pageData = pageData; +this.vectorInfo = vectorInfo; +this.nullBits = nullBits; + } + + public void setFactor(double factor) { +this.factor = factor; + } + + public void setFloatFactor(float floatFactor) { +this.floatFactor = floatFactor; + } + + public void basedOnType(CarbonColumnVector vector, DataType vectorDataType, int pageSize, + DataType pageDataType) { +if (vectorInfo.vector.getColumnVector() != null && ((CarbonColumnVectorImpl) vectorInfo.vector +.getColumnVector()).isComplex()) { + fillComplexType(vector.getColumnVector(), pageDataType); +} else { + fillPrimitiveType(vector, vectorDataType, pageSize, pageDataType); + vector.setIndex(0); +} + } + + private void fillComplexType(CarbonColumnVector vector, DataType pageDataType) { +CarbonColumnVectorImpl vectorImpl = (CarbonColumnVectorImpl) vector; +if (vector != null && vector.getChildrenVector() != null) { + ArrayList childElements = ((CarbonColumnVectorImpl) vector).getChildrenElements(); + for (int i = 0; i < childElements.size(); i++) { +int count = childElements.get(i); +typeComplexObject(vectorImpl.getChildrenVector().get(0), count, pageDataType); +vector.putArrayObject(); + } + vectorImpl.getChildrenVector().get(0).setIndex(0); +} + } + + private void fillPrimitiveType(CarbonColumnVector vector, DataType vectorDataType, int pageSize, + DataType pageDataType) { +// offset which denotes the start index for pageData +int pageIndex = vector.getIndex(); +int rowId = 0; + +// Filling into vector is done based on page
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #1431: [WIP] DataMap Access Path Optimization
CarbonDataQA1 commented on pull request #1431: URL: https://github.com/apache/carbondata/pull/1431#issuecomment-675362465 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2020/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3919) Improve concurrent query performance
[ https://issues.apache.org/jira/browse/CARBONDATA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3919. - Fix Version/s: 2.1.0 Resolution: Fixed > Improve concurrent query performance > > > Key: CARBONDATA-3919 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3919 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > problem1: when 500 queries executed concurrently. > checkIfRefreshIsNeeded method was synchronized. so only one thread was > working at time. > But actually synchronization is required only when schema modified to drop > tables. Not for whole function > > solution: synchronize only remove table part. Observed 500 query total > performance improved from 10s to 3 seconds in cluster. > > problem2: > TokenCache.obtainTokensForNamenodes was causing a performance bottleneck for > concurrent queries. so, removed it > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3858: [CARBONDATA-3919] Improve concurrent query performance
asfgit closed pull request #3858: URL: https://github.com/apache/carbondata/pull/3858 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance
akashrn5 commented on pull request #3858: URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675355454 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance
CarbonDataQA1 commented on pull request #3858: URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675351077 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2015/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance
CarbonDataQA1 commented on pull request #3858: URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675349093 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3756/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…
CarbonDataQA1 commented on pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#issuecomment-675347031 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3953) Dead lock when doing dataframe persist and loading
ChenKai created CARBONDATA-3953: --- Summary: Dead lock when doing dataframe persist and loading Key: CARBONDATA-3953 URL: https://issues.apache.org/jira/browse/CARBONDATA-3953 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.0 Reporter: ChenKai Attachments: image-2020-08-18-15-59-46-108.png, image-2020-08-18-16-03-33-370.png Thread-1 !image-2020-08-18-15-59-46-108.png! Thread-2 !image-2020-08-18-16-03-33-370.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] vikramahuja1001 opened a new pull request #3895: [WIP]SI fix fr not equal to filter
vikramahuja1001 opened a new pull request #3895: URL: https://github.com/apache/carbondata/pull/3895 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
ShreelekhyaG commented on a change in pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#discussion_r471986162 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSource.scala ## @@ -281,10 +281,22 @@ object CarbonSource { isExternal) val updatedFormat = CarbonToSparkAdapter .getUpdatedStorageFormat(storageFormat, updatedTableProperties, tableInfo.getTablePath) Review comment: added validation, changed the schema ordinal value of geocolumn from -1 to 0, so it is added in to the schema now and handled. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.
ShreelekhyaG commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675319661 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
vikramahuja1001 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-675315576 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-675307163 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2013/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-675299911 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3754/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org