[GitHub] carbondata pull request #2993: [CARBONDATA-3179] Map data load failure
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2993#discussion_r242435030 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala --- @@ -442,4 +441,45 @@ class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { "sort_columns is unsupported for map datatype column: mapfield")) } + test("Data Load Fail Issue") { +sql("DROP TABLE IF EXISTS carbon") +sql( + s""" + | CREATE TABLE carbon( + | mapField map + | ) + | STORED BY 'carbondata' + | """ +.stripMargin) +sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE carbon OPTIONS( + | 'header' = 'false') + """.stripMargin) +sql("INSERT INTO carbon SELECT * FROM carbon") +checkAnswer(sql("select * from carbon"), Seq( + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")) + )) + } + + test("Struct inside map") { +sql("DROP TABLE IF EXISTS carbon") --- End diff -- Why is there no result check for this test case of "Stunct inside mapââ ---
[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2713#discussion_r242430992 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java --- @@ -218,56 +218,46 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, this.bloomFilterSize, this.bloomFilterFpp, bloomCompress); } - /** - * returns all shard directories of bloom index files for query - * if bloom index files are merged we should get only one shard path - */ - private Set getAllShardPaths(String tablePath, String segmentId) { -String dataMapStorePath = CarbonTablePath.getDataMapStorePath( -tablePath, segmentId, dataMapName); -CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dataMapStorePath).listFiles(); -Set shardPaths = new HashSet<>(); + + private boolean isAllShardsMerged(String dmSegmentPath) { +boolean mergeShardExist = false; boolean mergeShardInprogress = false; -CarbonFile mergeShardFile = null; +CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dmSegmentPath).listFiles(); for (CarbonFile carbonFile : carbonFiles) { - if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) { -mergeShardFile = carbonFile; - } else if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) { + String fileName = carbonFile.getName(); + if (fileName.equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) { +mergeShardExist = true; + } else if (fileName.equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) { mergeShardInprogress = true; --- End diff -- If MERGE_INPROGRESS_FILE exists, shard's index file will be deleted sometime, so this scene need to be focus on, but this question shows up before this PR ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242429609 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -241,12 +241,52 @@ private static Schema readSchemaFromIndexFile(String indexFilePath) throws IOExc /** * This method return the version details in formatted string by reading from carbondata file + * If validate is true, it will check the version details between different carbondata files. + * And if version details are not the same, it will throw exception * - * @param dataFilePath - * @return + * @param path carbondata file path or folder path + * @param validate whether validate the version details between different carbondata files. + * @return string with information of who has written this file + * in which carbondata project version * @throws IOException */ - public static String getVersionDetails(String dataFilePath) throws IOException { + public static String getVersionDetails(String path, boolean validate) throws IOException { --- End diff -- ok, done ---
[GitHub] carbondata pull request #2931: [CARBONDATA-2999] support read schema from S3
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2931#discussion_r242428274 --- Diff: store/CSDK/test/main.cpp --- @@ -822,16 +831,17 @@ int main(int argc, char *argv[]) { // init jvm JNIEnv *env; env = initJVM(); -char *S3WritePath = "s3a://csdk/WriterOutput/carbondata2"; -char *S3ReadPath = "s3a://csdk/WriterOutput/carbondata"; +char *S3WritePath = "s3a://xubo/WriterOutput/carbondata2"; --- End diff -- please change the path ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2023/ ---
[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2991 @BJangir Please rebase it. ---
[GitHub] carbondata issue #2992: [CARBONDATA-3176] Optimize quick-start-guide documen...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2992 @xuchuanyin I mean it's plan, just explain why do we change => , not require add it to document now. ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2988 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1815/ ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2931 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1816/ ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2931 LGTM ---
[GitHub] carbondata pull request #2966: [CARBONDATA-3162][CARBONDATA-3163][CARBONDATA...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2966#discussion_r242423446 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -959,6 +959,14 @@ public static void setDataTypeConverter(DataTypeConverter converterLocal) { } } + /** + * As each load can have it's own time format. Reset the thread local for each load. + */ + public static void initializeFormatter() { --- End diff -- ok done ---
[GitHub] carbondata pull request #2897: [CARBONDATA-3080] Supporting local dictionary...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2897#discussion_r242423291 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -94,10 +93,9 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d } @Override public void fillRow(int rowId, CarbonColumnVector vector, int vectorRow) { -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +// always set dictionary otherwise +// empty dictionary will get set if same col is called again in projection. +vector.setDictionary(dictionary); --- End diff -- @BJangir 1. Please check and confirm if the same problem occurs with CarbonSession also 2. Modify the PR description and specify the details for bug fixed in this PR after completion of point 1 ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2988 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1814/ ---
[jira] [Resolved] (CARBONDATA-3160) Compaction support with MAP data type
[ https://issues.apache.org/jira/browse/CARBONDATA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-3160. - Resolution: Fixed Fix Version/s: 1.5.2 > Compaction support with MAP data type > - > > Key: CARBONDATA-3160 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3160 > Project: CarbonData > Issue Type: Sub-task >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.5.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Support compaction with MAP type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2931 rebase and fix the conflict ---
[GitHub] carbondata pull request #2988: [CARBONDATA-3174] Fix trailing space issue wi...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2988#discussion_r242421926 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -747,7 +747,7 @@ private Schema updateSchemaFields(Schema schema, Set longStringColumns) Field[] fields = schema.getFields(); for (int i = 0; i < fields.length; i++) { if (fields[i] != null) { -fields[i].updateNameToLowerCase(); +//fields[i].updateName(); --- End diff -- remove this ---
[GitHub] carbondata pull request #2995: [CARBONDATA-3160] Compaction support with MAP...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2995 ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10072/ ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2995 LGTM ---
[jira] [Assigned] (CARBONDATA-3161) Pipe "|" dilimiter is not working for streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawan Malwal reassigned CARBONDATA-3161: Assignee: (was: Pawan Malwal) > Pipe "|" dilimiter is not working for streaming table > - > > Key: CARBONDATA-3161 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3161 > Project: CarbonData > Issue Type: Bug > Components: data-load >Reporter: Pawan Malwal >Priority: Minor > > csv data with "|" as a dilimiter is not getting loaded into streaming table > correctly. > *DDL:* > create table table1_st(begintime TIMESTAMP, deviceid STRING, statcycle INT, > topologypath STRING, devicetype STRING, rebootnum INT) stored by 'carbondata' > TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT','sort_columns'='deviceid,begintime','streaming' > ='true'); > *Run in spark shell:* > import org.apache.spark.sql.SparkSession; > import org.apache.spark.sql.SparkSession.Builder; > import org.apache.spark.sql.CarbonSession; > import org.apache.spark.sql.CarbonSession.CarbonBuilder; > import org.apache.spark.sql.streaming._ > import org.apache.carbondata.streaming.parser._ > val enableHiveSupport = SparkSession.builder().enableHiveSupport(); > val carbon=new > CarbonBuilder(enableHiveSupport).getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/") > val df=carbon.readStream.text("/user/*.csv") > val qrymm_0001 = > df.writeStream.format("carbondata").option(CarbonStreamParser.CARBON_STREAM_PARSER, > > CarbonStreamParser.CARBON_STREAM_PARSER_CSV).{color:#FF}*option("delimiter","|")*{color}.option("header","false").option("dbName","stdb").option("checkpointLocation", > > "/tmp/tb1").option("bad_records_action","FORCE").option("tableName","table1_st").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("TIMESTAMPFORMAT","-dd-MM > HH:mm:ss").start > > *Sample records:* > begintime| deviceid| statcycle| topologypath| devicetype| rebootnum > 2018-10-01 00:00:00|Device1|0|dsad|STB|9 > 2018-10-01 00:05:00|Device1|0|Rsad|STB|4 > 2018-10-01 00:10:00|Device1|0|fsf|STB|6 > 2018-10-01 00:15:00|Device1|0|fdgf|STB|8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3159) Issue with SDK Write when empty array is given
[ https://issues.apache.org/jira/browse/CARBONDATA-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3159. -- Resolution: Fixed Fix Version/s: 1.5.2 > Issue with SDK Write when empty array is given > -- > > Key: CARBONDATA-3159 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3159 > Project: CarbonData > Issue Type: Bug >Reporter: Shivam Goyal >Priority: Minor > Fix For: 1.5.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2985: [HOTFIX] Fixed Query performance issue
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2985 ---
[GitHub] carbondata issue #2985: [HOTFIX] Fixed Query performance issue
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2985 LGTM ---
[GitHub] carbondata pull request #2966: [CARBONDATA-3162][CARBONDATA-3163][CARBONDATA...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2966#discussion_r242417861 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -959,6 +959,14 @@ public static void setDataTypeConverter(DataTypeConverter converterLocal) { } } + /** + * As each load can have it's own time format. Reset the thread local for each load. + */ + public static void initializeFormatter() { --- End diff -- Better rename to `clearFormatter` ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2022/ ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10071/ ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1813/ ---
[GitHub] carbondata issue #2951: [SDV] Add datasource testcases for Spark File Format
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2951 @shivamasn Please add test cases for map type too ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2988 @Shubh18s : I have checked the code, all the column names are stored without trim in SDK. But long_string_columns table properties is having trim. column name has string without trim but properties have string with trim. Hence the schema mismatch. After this change, sort_columns and invertedIndexFor are affected. As CarbonWriterBuilder.sortBy() is exposed to user, here also trim is not there as per previous code. so add trim() here also. similar changes in CarbonWriterBuilder.invertedIndexFor() ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1812/ ---
[jira] [Resolved] (CARBONDATA-3073) Support other interface in carbon writer of C++ SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3073. -- Resolution: Fixed Fix Version/s: (was: NONE) 1.5.2 > Support other interface in carbon writer of C++ SDK > > > Key: CARBONDATA-3073 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3073 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Fix For: 1.5.2 > > Time Spent: 14h 10m > Remaining Estimate: 0h > > when user create table and write data in C++ SDK, user sometimes need > configure withTableProperties, so we should Support configure TableProperties > in carbon writer of C++ SDK > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2983: [CARBONDATA-3119] Fixed SDK Write for Complex...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2983 ---
[GitHub] carbondata issue #2983: [CARBONDATA-3119] Fixed SDK Write for Complex Array ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2983 LGTM ---
[jira] [Created] (CARBONDATA-3179) DataLoad Failure in Map Data Type
MANISH NALLA created CARBONDATA-3179: Summary: DataLoad Failure in Map Data Type Key: CARBONDATA-3179 URL: https://issues.apache.org/jira/browse/CARBONDATA-3179 Project: CarbonData Issue Type: Bug Reporter: MANISH NALLA Assignee: MANISH NALLA Data Load failing for insert into table select * from table containing Map datatype -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2899: [CARBONDATA-3073][CARBONDATA-3044] Support co...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2899 ---
[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2899 LGTM ---
[GitHub] carbondata pull request #2983: [CARBONDATA-3119] Fixed SDK Write for Complex...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2983#discussion_r242403727 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/ArrayParserImpl.java --- @@ -56,6 +56,10 @@ public ArrayObject parse(Object data) { } return new ArrayObject(array); } + } else if (value.isEmpty()) { +Object[] array = new Object[1]; +array[0] = child.parse(value); --- End diff -- Why not use value instead of child.parse?? ---
[GitHub] carbondata issue #2914: [CARBONDATA-3093] Provide property builder for carbo...
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/2914 @ravipesala @KanakaKumar Please review this PR ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242400169 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonInputFormat.java --- @@ -106,9 +107,17 @@ private static CarbonTable getCarbonTable(Configuration configuration, String pa CarbonInputSplit split; for (int i = 0; i < splitList.size(); i++) { split = (CarbonInputSplit) splitList.get(i); - splits[i] = new CarbonHiveInputSplit(split.getSegmentId(), split.getPath(), split.getStart(), - split.getLength(), split.getLocations(), split.getNumberOfBlocklets(), split.getVersion(), - split.getBlockStorageIdMap()); + CarbonHiveInputSplit inputSplit = new CarbonHiveInputSplit(split.getSegmentId(), + split.getPath(), split.getStart(), split.getLength(), + split.getLocations(), split.getNumberOfBlocklets(), + split.getVersion(), split.getBlockStorageIdMap()); + BlockletDetailInfo info = new BlockletDetailInfo(); + info.setBlockSize(split.getLength()); + info.setBlockFooterOffset(split.getDetailInfo().getBlockFooterOffset()); + info.setVersionNumber(split.getVersion().number()); + info.setUseMinMaxForPruning(false); --- End diff -- Why do you set false in here? ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2931 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10070/ ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2931 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2021/ ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242395640 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -193,5 +196,4 @@ object HiveExample { } hiveEmbeddedServer2.stop() --- End diff -- please fix the issue: the hiveExample don't stop after running the code. HiveExample should stop. ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242394285 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -31,7 +31,7 @@ object HiveExample { def main(args: Array[String]) { --- End diff -- Please optimize this example and RunHiveExampleTest like:org.apache.carbondata.examplesCI.RunExamples. We should add the example to CI, and add some assert for exception and result. We should avoid that some developer/PR change code and lead to the hiveExample error, the developer need fix it before their PR merged ---
[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...
Github user kevinjmh commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2713#discussion_r242394373 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -178,15 +178,9 @@ private String getAncestorTablePath(CarbonTable currentTable) { for (BloomQueryModel bloomQueryModel : bloomQueryModels) { Set tempHitBlockletsResult = new HashSet<>(); LOGGER.debug("prune blocklet for query: " + bloomQueryModel); - BloomCacheKeyValue.CacheKey cacheKey = new BloomCacheKeyValue.CacheKey( - this.indexPath.toString(), bloomQueryModel.columnName); - BloomCacheKeyValue.CacheValue cacheValue = cache.get(cacheKey); - List bloomIndexList = cacheValue.getBloomFilters(); - for (CarbonBloomFilter bloomFilter : bloomIndexList) { -if (needShardPrune && !filteredShard.contains(bloomFilter.getShardName())) { --- End diff -- usage of this info is moved to `getBloomFilters` ---
[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2713#discussion_r242393730 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -178,15 +178,9 @@ private String getAncestorTablePath(CarbonTable currentTable) { for (BloomQueryModel bloomQueryModel : bloomQueryModels) { Set tempHitBlockletsResult = new HashSet<>(); LOGGER.debug("prune blocklet for query: " + bloomQueryModel); - BloomCacheKeyValue.CacheKey cacheKey = new BloomCacheKeyValue.CacheKey( - this.indexPath.toString(), bloomQueryModel.columnName); - BloomCacheKeyValue.CacheValue cacheValue = cache.get(cacheKey); - List bloomIndexList = cacheValue.getBloomFilters(); - for (CarbonBloomFilter bloomFilter : bloomIndexList) { -if (needShardPrune && !filteredShard.contains(bloomFilter.getShardName())) { --- End diff -- Why delete the code of filtererShard.contains(bloomFilter.getShardName()), I think this code can reduce time ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2969 @SteNicholas Please optimize the title ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242391201 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -85,19 +82,25 @@ object HiveExample { logger.info(s"HIVE CLI IS STARTED ON PORT $port ==") -statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " + - " (ID int, NAME string,SALARY double)") +statement.execute("DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE") + +statement.execute("CREATE TABLE HIVE_CARBON_EXAMPLE " + + " (ID int, NAME string,SALARY double) " + + "ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' " + + "WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default', " + + "'mapreduce.input.carboninputformat.tableName'='HIVE_CARBON_EXAMPLE')") + statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT \"org.apache.carbondata." + -"hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + -"MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + -"CarbonHiveSerDe\" ") + "hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + --- End diff -- Can you add space before OUTPUTFORMAT? ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242391224 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -85,19 +82,25 @@ object HiveExample { logger.info(s"HIVE CLI IS STARTED ON PORT $port ==") -statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " + - " (ID int, NAME string,SALARY double)") +statement.execute("DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE") + +statement.execute("CREATE TABLE HIVE_CARBON_EXAMPLE " + + " (ID int, NAME string,SALARY double) " + + "ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' " + + "WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default', " + + "'mapreduce.input.carboninputformat.tableName'='HIVE_CARBON_EXAMPLE')") + statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT \"org.apache.carbondata." + -"hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + -"MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + -"CarbonHiveSerDe\" ") + "hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + + "MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + --- End diff -- Can you add space before SERDE? ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242390063 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -43,28 +43,25 @@ object HiveExample { import org.apache.spark.sql.CarbonSession._ -val carbonSession = SparkSession - .builder() - .master("local") - .appName("HiveExample") - .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport() - .getOrCreateCarbonSession( -store, metaStore_Db) +val carbonSession = SparkSession.builder() + .master("local").appName("HiveExample") + .enableHiveSupport() + .config("spark.sql.warehouse.dir", warehouse) + .getOrCreateCarbonSession(store, metaStore_Db) carbonSession.sql("""DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE""".stripMargin) carbonSession .sql( -"""CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY double) STORED BY - |'CARBONDATA' """ - .stripMargin) +"CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY double) " + + "STORED BY 'CARBONDATA' ") carbonSession.sql( s""" - LOAD DATA LOCAL INPATH '$rootPath/integration/hive/src/main/resources/data.csv' INTO - TABLE - HIVE_CARBON_EXAMPLE - """) + LOAD DATA LOCAL INPATH '$rootPath/integration/hive/src/main/resources/data.csv' INTO + TABLE + HIVE_CARBON_EXAMPLE --- End diff -- please optimize it or revert it. ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r242389254 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -43,28 +43,25 @@ object HiveExample { import org.apache.spark.sql.CarbonSession._ -val carbonSession = SparkSession - .builder() - .master("local") - .appName("HiveExample") - .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport() - .getOrCreateCarbonSession( -store, metaStore_Db) +val carbonSession = SparkSession.builder() + .master("local").appName("HiveExample") --- End diff -- Please keep itï¼no need to change ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2931 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1811/ ---
[GitHub] carbondata pull request #2713: [WIP][CARBONDATA-2931][BloomDataMap] Optimize...
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2713#discussion_r242387613 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java --- @@ -218,56 +218,46 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, this.bloomFilterSize, this.bloomFilterFpp, bloomCompress); } - /** - * returns all shard directories of bloom index files for query - * if bloom index files are merged we should get only one shard path - */ - private Set getAllShardPaths(String tablePath, String segmentId) { -String dataMapStorePath = CarbonTablePath.getDataMapStorePath( -tablePath, segmentId, dataMapName); -CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dataMapStorePath).listFiles(); -Set shardPaths = new HashSet<>(); + + private boolean isAllShardsMerged(String dmSegmentPath) { +boolean mergeShardExist = false; boolean mergeShardInprogress = false; -CarbonFile mergeShardFile = null; +CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dmSegmentPath).listFiles(); for (CarbonFile carbonFile : carbonFiles) { - if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) { -mergeShardFile = carbonFile; - } else if (carbonFile.getName().equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) { + String fileName = carbonFile.getName(); + if (fileName.equals(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME)) { +mergeShardExist = true; + } else if (fileName.equals(BloomIndexFileStore.MERGE_INPROGRESS_FILE)) { mergeShardInprogress = true; - } else if (carbonFile.isDirectory()) { - shardPaths.add(FileFactory.getPath(carbonFile.getAbsolutePath()).toString()); } } -if (mergeShardFile != null && !mergeShardInprogress) { - // should only get one shard path if mergeShard is generated successfully - shardPaths.clear(); - shardPaths.add(FileFactory.getPath(mergeShardFile.getAbsolutePath()).toString()); -} -return shardPaths; +return mergeShardExist && !mergeShardInprogress; } @Override public List getDataMaps(Segment segment) throws IOException { List dataMaps = new ArrayList<>(); try { - Set shardPaths = segmentMap.get(segment.getSegmentNo()); - if (shardPaths == null) { -shardPaths = getAllShardPaths(getCarbonTable().getTablePath(), segment.getSegmentNo()); -segmentMap.put(segment.getSegmentNo(), shardPaths); - } - Set filteredShards = segment.getFilteredIndexShardNames(); - for (String shard : shardPaths) { -if (shard.endsWith(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME) || -filteredShards.contains(new File(shard).getName())) { - // Filter out the tasks which are filtered through Main datamap. - // for merge shard, shard pruning delay to be done before pruning blocklet - BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); - bloomDM.init(new BloomDataMapModel(shard, cache, segment.getConfiguration())); - bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); - bloomDM.setFilteredShard(filteredShards); - dataMaps.add(bloomDM); -} + String dmSegmentPath = CarbonTablePath.getDataMapStorePath( + getCarbonTable().getTablePath(), segment.getSegmentNo(), dataMapName); + boolean useMergeShard = isAllShardsMerged(dmSegmentPath); + + // make use of filtered shard info from default datamap to build bloom datamap + BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); + bloomDM.init(new BloomDataMapModel(dmSegmentPath, cache, FileFactory.getConfiguration())); + bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); + bloomDM.setFilteredShard(segment.getFilteredIndexShardNames(), useMergeShard); + dataMaps.add(bloomDM); + + // save shard info for clearing cache + Set shardPaths = new HashSet<>(); + if (useMergeShard) { +shardPaths.add(dmSegmentPath + File.separator + +BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME); + } else { +shardPaths.addAll(segment.getFilteredIndexShardNames()); } + segmentMap.put(segment.getSegmentNo(), shardPaths); --- End diff -- segmentMap is used cache the shardPaths, now it's uselessï¼I don't think it's necessary to get shardPaths it's ok to change segmentMap to a Set that add segment no
[GitHub] carbondata pull request #2931: [CARBONDATA-2999] support read schema from S3
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2931#discussion_r242387341 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -147,34 +170,55 @@ public static Schema readSchema(String path, boolean validateSchema) throws IOEx throw new CarbonDataLoadingException("No carbonindex file in this path."); } } else { - String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT)[0].getAbsolutePath(); - return readSchemaFromIndexFile(indexFilePath); + String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT, conf)[0].getAbsolutePath(); + return readSchemaFromIndexFile(indexFilePath, conf); } } + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param validateSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean validateSchema) throws IOException { +Configuration conf = new Configuration(); +return readSchema(path, validateSchema, conf); + } + /** * Read carbondata file and return the schema * This interface will be removed, * please use readSchema instead of this interface * * @param dataFilePath carbondata file store path + * @param conf hadoop configuration support, can set s3a AK,SK, + * end point and other conf with this * @return Schema object * @throws IOException */ @Deprecated - public static Schema readSchemaInDataFile(String dataFilePath) throws IOException { -return readSchema(dataFilePath, false); + public static Schema readSchemaInDataFile(String dataFilePath, Configuration conf) --- End diff -- ok, done ---
[GitHub] carbondata pull request #2988: [CARBONDATA-3174] Fix trailing space issue wi...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2988#discussion_r242377558 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -2490,6 +2490,54 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { FileUtils.deleteDirectory(new File(writerPath)) } + test("check varchar with trailing space") { --- End diff -- besides, this is for varchar columns, why not update the code there? ---
[GitHub] carbondata issue #2992: [CARBONDATA-3176] Optimize quick-start-guide documen...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2992 @xubo245 "and plan to support alluxio path too." --- I think there is no need to add this currently. We should only describe the feature implemented. ---
[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2020/ ---
[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10069/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10068/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2019/ ---
[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1810/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user SteNicholas commented on the issue: https://github.com/apache/carbondata/pull/2969 @xubo245 @zzcclp @xuchuanyin I have already fixed hive integration bug based on HiveExample.Please review these updates. ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1809/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1808/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user SteNicholas commented on the issue: https://github.com/apache/carbondata/pull/2969 @xubo245 @zzcclp @xuchuanyin I have already fixed hive integration bug based on HiveExample.Please review these updates. ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1806/ ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10063/ ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2014/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1805/ ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user SteNicholas commented on the issue: https://github.com/apache/carbondata/pull/2969 @xubo245 @zzcclp @xuchuanyin I have already fixed hive integration bug based on HiveExample.Please review these updates. ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exception
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2969 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1804/ ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1803/ ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10061/ ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2988 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10062/ ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2988 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2013/ ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2012/ ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2011/ ---
[jira] [Assigned] (CARBONDATA-3043) Add test framework for CSDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Babulal reassigned CARBONDATA-3043: --- Assignee: Babulal (was: xubo245) > Add test framework for CSDK > --- > > Key: CARBONDATA-3043 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3043 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: Babulal >Priority: Major > Fix For: 1.5.2 > > > Add test framework for CSDK,for unit test > googletest is a popular test framework, we can try to use it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1802/ ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10060/ ---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2949 ---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2949 LGTM ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2988 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1801/ ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2010/ ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10058/ ---
[GitHub] carbondata issue #2995: [CARBONDATA-3160] Compaction support with MAP data t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2995 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1800/ ---
[jira] [Updated] (CARBONDATA-3177) Nameservice support for presto on carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Sun updated CARBONDATA-3177: - Description: Hey team, As per the carbondata-presto integration reference, the carbondata-store is configured with specified namenode address, e.g. hdfs://namenode:9000/test/carbondata. However we leverage namenode federation so the hdfs entry is configured with nameservice, e.g. hdfs://nameservice1/test/carbondata. The nameservice information is configured in hdfs-site.xml. It seems that carbondata connector has no way to load this configuration file. So it fails with exception message: {code:java} Query 20181217_142352_01851_paya2 failed: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 at org.apache.carbondata.presto.impl.CarbonTableReader.updateCarbonFile(CarbonTableReader.java:204) at org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaList(CarbonTableReader.java:216) at org.apache.carbondata.presto.impl.CarbonTableReader.getSchemaNames(CarbonTableReader.java:189) at org.apache.carbondata.presto.CarbondataMetadata.listSchemaNamesInternal(CarbondataMetadata.java:86) at org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:135) at org.apache.carbondata.presto.CarbondataMetadata.getTableMetadataInternal(CarbondataMetadata.java:240) at org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:232) at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableMetadata(ClassLoaderSafeConnectorMetadata.java:162) at com.facebook.presto.metadata.MetadataManager.getTableMetadata(MetadataManager.java:423) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:857) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:262) at com.facebook.presto.sql.tree.Table.accept(Table.java:53) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:276) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:1780) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:962) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:262) at com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:127) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:276) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:286) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:683) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:262) at com.facebook.presto.sql.tree.Query.accept(Query.java:94) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:276) at com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:248) at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:76) at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:68) at com.facebook.presto.execution.SqlQueryExecution.(SqlQueryExecution.java:206) at com.facebook.presto.execution.SqlQueryExecution.(SqlQueryExecution.java:96) at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:752) at com.facebook.presto.execution.SqlQueryManager.createQueryInternal(SqlQueryManager.java:361) at com.facebook.presto.execution.SqlQueryManager.lambda$createQuery$4(SqlQueryManager.java:303) at com.facebook.presto.$gen.Presto_0_214_9_g36965f8_dirty__0_214_100_220181217_102607_1.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2988 @Shubh18s : why for only varchar columns ? how it was handled other columns ? I guess this problem is there for other columns also ---
[GitHub] carbondata pull request #2988: [CARBONDATA-3174] Fix trailing space issue wi...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2988#discussion_r242161063 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -2490,6 +2490,54 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { FileUtils.deleteDirectory(new File(writerPath)) } + test("check varchar with trailing space") { --- End diff -- No need to duplicate test cases. In the existing varchar columns test case, add a trailing space to one of the columns. ---
[GitHub] carbondata pull request #2995: [CARBONDATA-3160] Compaction support with MAP...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/2995 [CARBONDATA-3160] Compaction support with MAP data type Support compaction with MAP data type in table. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done UT Added - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-3160 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2995.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2995 commit 4a4fe66a1363cfc5166d92f25d8f92ab3c1f5b9b Author: dhatchayani Date: 2018-12-17T14:08:07Z [CARBONDATA-3160] Compaction support with MAP data type ---
[GitHub] carbondata issue #2994: [WIP][CARBONDATA-2670] changed the impl of s3 rename...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2994 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10059/ ---
[GitHub] carbondata pull request #2988: [CARBONDATA-3174] Fix trailing space issue wi...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2988#discussion_r242157568 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/Field.java --- @@ -55,7 +55,7 @@ * @param type datatype of field, specified in strings. */ public Field(String name, String type) { -this.name = name; +this.name = name.toLowerCase().trim(); --- End diff -- CarbonWriterBuilder.updateSchemaFields() is already converting to lowercase, just add trim in that method. No need to handle for each here. ---
[GitHub] carbondata issue #2994: [WIP][CARBONDATA-2670] changed the impl of s3 rename...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2994 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1799/ ---
[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2994 ---
[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2994 [WIP][CARBONDATA-2670] changed the impl of s3 renameforce to rewrite Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-2670 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2994.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2994 commit b96fa42ee1fa0d3ba2d2081574d3a71eadb26ac2 Author: kunal642 Date: 2018-12-17T13:53:14Z [CARBONDATA-2670] changed the impl of s3 renameforce to rewrite ---
[jira] [Created] (CARBONDATA-3178) select query with in clause on timestamp column inconsistent with filter on same column
Anshul Topnani created CARBONDATA-3178: -- Summary: select query with in clause on timestamp column inconsistent with filter on same column Key: CARBONDATA-3178 URL: https://issues.apache.org/jira/browse/CARBONDATA-3178 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.5.1 Environment: spark 2.2 Reporter: Anshul Topnani Steps : Create table : CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' ; Load Data : LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); Select Queries: select * from uniqdata where dob in ('1970-01-01 01:00:03.0'); +--++--+--+--+-+-+--+--+-+-+--+--+ | cust_id | cust_name | active_emui_version | dob | doj | bigint_column1 | bigint_column2 | decimal_column1 | decimal_column2 | double_column1 | double_column2 | integer_column1 | +--++--+--+--+-+-+--+--+-+-+--+--+ +--++--+--+--+-+-+--+--+-+-+--+--+ No rows selected (0.702 seconds) select * from uniqdata where dob ='1970-01-01 01:00:03.0'; +--+--++++-+-+-+--+--+---+--+--+ | cust_id | cust_name | active_emui_version | dob | doj | bigint_column1 | bigint_column2 | decimal_column1 | decimal_column2 | double_column1 | double_column2 | integer_column1 | +--+--++++-+-+-+--+--+---+--+--+ | 9000 | CUST_NAME_0 | ACTIVE_EMUI_VERSION_0 | 1970-01-01 01:00:03.0 | 1970-01-01 02:00:03.0 | 123372036854 | -223372036854 | 12345678901.123400 | NULL | 1.12345674897976E10 | -1.12345674897976E10 | 1 | +--+--++++-+-+-+--+--+---+--+--+ 1 row selected (0.57 seconds) Actual Issue : Correct data is projected in case of filter query with '='. For same column with in clause, no data is projected. Expected : Both the select queries should show correct result. (As projected in second select query). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2899 LGTM ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user chandrasaripaka commented on the issue: https://github.com/apache/carbondata/pull/2161 I am changing the implementation for the same to read and copy the data from file system and change the same as suggested by @ravipesala , I get the similar issue with the previous code..as we need a force rename here, we have to delete the file again..and do the previous run.. ---
[GitHub] carbondata issue #2991: [WIP] Add build script and add test case with Google...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2991 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1798/ ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2990 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2009/ ---
[GitHub] carbondata pull request #2989: [CARBONDATA-3175]Fix Testcase failures in com...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2989 ---