Re: About measure in carbon
Hi Ravi, Currently we are using hive parser to parse carbon create table DDL. Hive does not support short datatype instead it supports short and byte with a different name as smallInt(16 bit) and tinyInt(8 bit) data types. But in carbon we do not support these datatypes. Do we need to handle these datatypes in carbon?. If yes then I can create a jira for this issue and track this feature. Regards Manish Gupta -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/About-measure-in-carbon-tp3627p3802.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[GitHub] incubator-carbondata pull request #292: [CARBONDATA-375] Dictionary cache no...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/292 [CARBONDATA-375] Dictionary cache not getting cleared after task completion in dictionary decoder Problem: Dictionary cache not getting cleared after task completion in dictionary decoder Analysis: Currently LRU cache eviction policy is based on dictionary access count. For cache to remove a entry its access count must be 0. In dictionary decoder after conversion of surrogate key to actual value the access count for dictionary columns in query is not getting decremented due to which it will never be cleared from memory when LRU cache size is configured. Fix: Add a task completion listener which will take care of clearing the dictionary in case of both success and failure Impact area: LRU cache eviction policy which can lead to query and data load failure You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/incubator-carbondata dictionary_decoder_clear_dictionary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit b305f34e1014267b3706c287cef7070189fc3c28 Author: manishgupta88 <tomanishgupt...@gmail.com> Date: 2016-11-03T15:48:03Z Problem: Dictionary cache not getting cleared after task completion in dictionary decoder Analysis: Currently LRU cache eviction policy is based on dictionary access count. For cache to remove a entry its access count must be 0. In dictionary decoder after conversion of surrogate key to actual value the access count for dictionary columns in query is not getting decremented due to which it will never be cleared from memory when LRU cache size is configured. Fix: Add a task completion listener which will take care of clearing the dictionary in case of both success and failure Impact area: LRU cache eviction policy which can lead to query and data load failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/234 [CARBONDATA-315] Data loading fails if parsing a double value returns infinity Problem: Data loading fails if parsing a double value returns infinity Analysis: During data load, if a value specified is too big for a double DataType column then while parsing that value as double result is returned as "Infinity". Due to this while we calculate min and max value for measures in carbon data writer step it throws an exception. Fix: If result is Infinity or NAN for double value parsing then make the value as null and add it to bad records. Impact area: Data load which contains non parseable values for a datatype. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/incubator-carbondata double_value_range_failure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/234.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #234 commit f7225f974828edd8b340f88fbfaa2f60d8a7d582 Author: manishgupta88 <tomanishgupt...@gmail.com> Date: 2016-10-13T09:47:52Z Problem: Data loading fails if parsing a double value returns infinity Analysis: During data load, if a value specified is too big for a double DataType column then while parsing that value as double result is returned as "Infinity". Due to this while we calculate min and max value for measures in carbon data writer step it throws an exception. Fix: If result is Infinity or NAN for double value parsing then make the value as null and add it to bad records. Impact area: Data load which contains non parseable values for a datatype. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81299063 --- Diff: format/src/main/thrift/schema.thrift --- @@ -124,6 +124,7 @@ struct TableSchema{ 1: required string table_id; // ID used to 2: required list table_columns; // Columns in the table 3: required SchemaEvolution schema_evolution; // History of schema evolution of this table + 4: optional i32 block_size --- End diff -- @Zhangshunyu ...if you register the property in hive metastore, the proeprty will never be lost even on restart of thrift server as we do for tablepath...you can create table flow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81279628 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -196,11 +197,22 @@ public AbstractFactDataWriter(String storeLocation, int measureCount, int mdKeyL blockIndexInfoList = new ArrayList<>(); // get max file size; CarbonProperties propInstance = CarbonProperties.getInstance(); -this.fileSizeInBytes = Long.parseLong(propInstance -.getProperty(CarbonCommonConstants.MAX_FILE_SIZE, -CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL)) -* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR -* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L; +// If blocksize is defined by create table ddl, then use this value otherwise +// use system level property +if (blocksize >= CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MIN_VAL && +blocksize <= CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MAX_VAL) { + LOGGER.audit("The max file size is set in table level. "); + this.fileSizeInBytes = blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR + * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L; +} else { + LOGGER.audit("The max file size in create table ddl is not set or is invalid, " + + "so here use " + CarbonCommonConstants.MAX_FILE_SIZE + "to set. "); --- End diff -- @Zhangshunyu ...I am thinking instead of validating block size here specified by the user, we can validate at the time of creating the table and in error message we can display the range of allowed block size so that user is not confused. If user does not specify the block size the consider the default value and update in the schema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #180: [CARBONDATA-260] Equal or lesser val...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/180 [CARBONDATA-260] Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception Problem: Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception Analysis: If column count in CSV header is more or equal to MAXCOLUMNS option value then array index out of bound exception is thrown by the Univocity CSV parser. This is because while parsing the row, parser adds each row to an array and increments the index and after incrementing it performs one more operation using the incremented index value which leads to array index pf bound exception. Code snipped as attached below for CSV parser. public void valueParsed() { this.parsedValues[column++] = appender.getAndReset(); this.appender = appenders[column]; } e.g. In the above code if column value is 7 then array index will be from 0-6 and when column value becomes 6 then in the second line ArrayIndexOutOfBoundException will be thrown as column value will become 7. Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS option value or default value, increment it by 1. Impact: Data load flow You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/incubator-carbondata maxcolumns_array_indexOfBound Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/180.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #180 commit 3f32424e55615c8e45470d5169b817f9f703dc3e Author: manishgupta88 <tomanishgupt...@gmail.com> Date: 2016-09-20T14:21:33Z Problem: Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception Analysis: If column count in CSV header is more or equal to MAXCOLUMNS option value then array index out of bound exception is thrown by the Univocity CSV parser. This is because while parsing the row, parser adds each row to an array and increments the index and after incrementing it performs one more operation using the incremented index value which leads to array index pf bound exception. Code snipped as attached below for CSV parser. public void valueParsed() { this.parsedValues[column++] = appender.getAndReset(); this.appender = appenders[column]; } Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS option value or default value, increment it by 1. Impact: Data load flow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/161#discussion_r79282917 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/Compactor.scala --- @@ -69,7 +69,9 @@ object Compactor { schemaName, factTableName, validSegments, - carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId + carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId, + colCardinality = Array[Int](0), --- End diff -- Instead of creating an empty list and reassigning it you can create a reference of Array[Int] type like var colCardinality: Array[Int] = null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [WIP] Added table Status lock while ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79111960 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -306,13 +330,17 @@ private Integer compareDateValues(Long loadValue, Long userValue) { * @return */ public List updateDeletionStatus(String loadDate, String tableFolderPath, - Long loadStartTime) { -ICarbonLock carbonLock = CarbonLockFactory + Long loadStartTime, String dbName, String tableName) throws Exception { --- End diff -- same comment as above...take the database and table name from carbon table identifier instance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [WIP] Added table Status lock while ...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79111804 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -246,15 +247,22 @@ private Integer compareDateValues(Long loadValue, Long userValue) { * updates deletion status * @param loadIds * @param tableFolderPath + * @param dbName + * @param tableName * @return */ - public List updateDeletionStatus(List loadIds, String tableFolderPath) { -ICarbonLock carbonLock = CarbonLockFactory + public List updateDeletionStatus(List loadIds, String tableFolderPath, + String dbName, String tableName) throws Exception { --- End diff -- I think no need to to change the method signature here...database name and table name can be taken from carbontable identifier and carbonTableIdentifier object is available in the mehod --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #148: [CARBONDATA-233] bad record logger s...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/148#discussion_r78314030 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -1133,6 +1149,9 @@ else if(isComplexTypeColumn[j]) { .getDirectDictionaryGenerator(details.getColumnType()); surrogateKeyForHrrchy[0] = directDictionaryGenerator1.generateDirectSurrogateKey(tuple); + if (badRecordsLoggerEnable && surrogateKeyForHrrchy[0] == 1) { +surrogateKeyForHrrchy[0] = -1; --- End diff -- Replace -1 and 1 with proper identifiers from CarbonCommonConstants to give proper meaning and improve code readability --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #124: [CARBONDATA-125] Remove usage of cur...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/124 [CARBONDATA-125] Remove usage of currentRestructureNumber from the code This PR is mainly to improve code quality of Carbon which includes removal of an unwanted feature. Problem: Code contains currentRestructureNumber variable which is not used in the code now Impact area: Data load flow Fix: Remove the usage of currentRestructureNumber variable everywhere in the code You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/incubator-carbondata remove_current_restructure_number Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #124 commit fdb52d7f774acc89cfc27b830ed851926b71bd7e Author: manishgupta88 <tomanishgupt...@gmail.com> Date: 2016-09-02T10:36:05Z Problem: Code contains currentRestructureNumber variable which is not used in the code now Impact area: Data load flow Fix: Remove the usage of currentRestructureNumber variable everywhere in the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #107: [CARBONDATA-191]Fix bug when quote c...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/107#discussion_r76787331 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithSingleQuotechar.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.integration.spark.testsuite.dataload + +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +/** + * Test Class for data loading when there is single quote in fact data + * + */ +class TestLoadDataWithSingleQuotechar extends QueryTest with BeforeAndAfterAll { + override def beforeAll { +sql("DROP TABLE IF EXISTS carbontable") +sql( + "CREATE TABLE carbontable (id Int, name String) STORED BY 'carbondata'") + } + + test("test data loading with single quote char") { +try { + sql( +"LOAD DATA LOCAL INPATH './src/test/resources/dataWithSingleQuote.csv' INTO TABLE " + + "carbontable OPTIONS('DELIMITER'= ',')") + checkAnswer( +sql("SELECT * from carbontable"), +Seq(Row("Tom",1), + Row("Tony\n3,Lily",2), + Row("Games\"",4), + Row("prival\"\n6,\"hello\"",5) +) --- End diff -- @Jay357089 ...Please perform the below mentioned activities. 1. Can you test the same scenario with same set of data in spark and hive to check their behavior and paste the result screenshot here. 2. What about cases when instead of default escape character (\), some other escape character is given (like @, ^, /, etc)? Test with some other escape characters in carbon, spark and hive and verify that behavior is same everywhere and paste the snapshots here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/82#discussion_r76625697 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() { } return dictionaryOneChunkSize; } + + /** + * @param csvFilePath + * @return + */ + public static String readHeader(String csvFilePath) { + +DataInputStream fileReader = null; +BufferedReader bufferedReader = null; +String readLine = null; + +try { + fileReader = + FileFactory.getDataInputStream(csvFilePath, FileFactory.getFileType(csvFilePath)); + bufferedReader = + new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); --- End diff -- @foryou2030 instead of using Charset.defaultCharset() use the below line of code. Charset.forName( CarbonCommonConstants.DEFAULT_CHARSET) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #97: [CARBONDATA-180]give proper error mes...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/97#discussion_r76381309 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -689,6 +689,10 @@ object GlobalDictionaryUtil extends Logging { generatePredefinedColDictionary(colDictFilePath, table, dimensions, carbonLoadModel, sqlContext, hdfsLocation, dictfolderPath) } +if (headers.length > df.columns.length) { + logError("Maybe either delimiter or fileheader is wrong") + throw new DataLoadingException("Maybe either delimiter or fileheader is wrong") --- End diff -- @QiangCai ...Please modify this message as below: "Either delimiter or fileheader provided is not correct" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #97: [CARBONDATA-180]give proper error mes...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/97#discussion_r76373466 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -705,7 +709,7 @@ object GlobalDictionaryUtil extends Logging { // check result status checkStatus(carbonLoadModel, sqlContext, model, statusList) } else { - logInfo("have no column need to generate global dictionary in Fact file") + logInfo("have no column need to generate global dictionary in source data files") --- End diff -- @QiangCai ...Please modify the logger message as below: "No column found for generating global dictionary in source data files" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #90: [CARBONDATA-175]Fix Bug: Load Decimal...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/90#discussion_r76257293 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -62,7 +62,12 @@ public static Object getMeasureValueBasedOnDataType(String msrValue, DataType da case INT: return Double.valueOf(msrValue).longValue(); case LONG: -return Long.valueOf(msrValue); +int index = msrValue.indexOf("."); +if (index != -1) { + return Long.valueOf(msrValue.substring(0,index)); +} else { + return Long.valueOf(msrValue); +} --- End diff -- @lion-x ...It is a correct behavior to return null value in this case as our validation is strict. If any column which has BigInt datatype is included as dimension then also we make decimal value as null because of strict validation. That is why I suggested that dont change the behavior for Long parsing, instead change the parsing logic for Int datatype to keep the strict validation behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #90: [CARBONDATA-175]Fix Bug: Load Decimal...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/90#discussion_r76028231 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -62,7 +62,12 @@ public static Object getMeasureValueBasedOnDataType(String msrValue, DataType da case INT: return Double.valueOf(msrValue).longValue(); case LONG: -return Long.valueOf(msrValue); +int index = msrValue.indexOf("."); +if (index != -1) { + return Long.valueOf(msrValue.substring(0,index)); +} else { + return Long.valueOf(msrValue); +} --- End diff -- @lion-x ...keep the code same for INT and LONG datatype case...i think instead of writing this code, that would be better. Sample code as below case INT: case LONG: return Double.valueOf(msrValue).longValue(); --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/81#discussion_r76017047 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -773,20 +776,27 @@ object CarbonDataRDDFactory extends Logging { } catch { case ex: Throwable => loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE - logInfo("DataLoad failure") + ex match { +case sparkException: SparkException => + if (sparkException.getCause.isInstanceOf[DataLoadingException]) { +executorMessage = sparkException.getCause.getMessage +errorMessage = errorMessage + ": " + executorMessage + } +case _ => --- End diff -- @Zhangshunyu ...for case _ also get the message from throwable and add it to error message same as you have done for sparkException case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869589 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging { loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE logInfo("DataLoad failure") logger.error(ex) --- End diff -- @Zhangshunyu Move logInfo and logger.error below the case match blockalso in logInfo log the executorMessage. you can move the variable message also above with default value as "DataLoad failure: " so that the same can be used for logInfo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869120 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging { loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE logInfo("DataLoad failure") logger.error(ex) + ex match { +case sparkException: SparkException => + if (sparkException.getCause.isInstanceOf[DataLoadingException]) { +executorMessage = sparkException.getCause.getMessage + } +case _ => --- End diff -- @Zhangshunyu ...for underscore case also get the message and assign it to variable executorMessage because lets say the control comes to case _ then variable executorMessage will have empty value and message will remain as "DataLoad failure: " which will be incorrect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/83#discussion_r75865451 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala --- @@ -205,6 +205,20 @@ class ExternalColumnDictionaryTestCase extends QueryTest with BeforeAndAfterAll loadSqlRelation, "deviceInformationId", "10086") } + test("COLUMNDICT and ALL_DICTIONARY_PATH can not be used together") { +try { + sql(s""" +LOAD DATA LOCAL INPATH "$complexFilePath1" INTO TABLE loadSqlTest + OPTIONS('COLUMNDICT'='$extColDictFilePath1',"ALL_DICTIONARY_PATH"='$extColDictFilePath1') +""") + assert(false) --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/83#discussion_r75860680 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala --- @@ -205,6 +205,20 @@ class ExternalColumnDictionaryTestCase extends QueryTest with BeforeAndAfterAll loadSqlRelation, "deviceInformationId", "10086") } + test("COLUMNDICT and ALL_DICTIONARY_PATH can not be used together") { +try { + sql(s""" +LOAD DATA LOCAL INPATH "$complexFilePath1" INTO TABLE loadSqlTest + OPTIONS('COLUMNDICT'='$extColDictFilePath1',"ALL_DICTIONARY_PATH"='$extColDictFilePath1') +""") + assert(false) --- End diff -- assert(false) is not required as Load command will throw an exception and control will go to exception block --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/83#discussion_r75860533 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala --- @@ -1001,6 +1001,12 @@ class CarbonSqlParser() "SERIALIZATION_NULL_FORMAT", "ALL_DICTIONARY_PATH" ) +if (options.exists(_._1.equalsIgnoreCase("COLUMNDICT")) && + options.exists(_._1.equalsIgnoreCase("ALL_DICTIONARY_PATH"))) { + val errorMessage = "Error: COLUMNDICT and ALL_DICTIONARY_PATH can not be used together" + +" in options" + throw new MalformedCarbonCommandException(errorMessage) +} --- End diff -- @Jay357089 .Move this newly added code below if loop for not isSupported check. I think ideally first we should validate the options whether they are among the ones allowed for load command and then perform other checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---