[GitHub] [carbondata] xubo245 edited a comment on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 edited a comment on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#issuecomment-573391580 optimized, please review it again. @jackylk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 commented on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#issuecomment-573391580 optimized, please review it again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#discussion_r365564279 ## File path: python/pycarbon/etl/carbon_dataset_metadata.py ## @@ -0,0 +1,235 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more Review comment: already move to core folder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#discussion_r365564219 ## File path: python/pycarbon/integration/tensorflow.py ## @@ -0,0 +1,358 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +"""A set of Tensorflow specific helper functions for the unischema""" Review comment: done. Mnist: tf_external_example_carbon_unified_api.py This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#discussion_r365563240 ## File path: python/pycarbon/reader.py ## @@ -0,0 +1,202 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from obs import ObsClient Review comment: Yes. Didn't test on S3 and didn't test the Compatible between S3 and OBS. We can do it in the next PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow URL: https://github.com/apache/carbondata/pull/3479#discussion_r365562983 ## File path: python/pycarbon/README.md ## @@ -0,0 +1,53 @@ +# PyCarbon + +Optimized data access for AI based on CarbonData files, and we can use PyCarbon lib to read CarbonData, also prepare training data for different computing framework, e.g. TensorFlow, PyTorch, MXNet. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on issue #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile
xubo245 commented on issue #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile URL: https://github.com/apache/carbondata/pull/3526#issuecomment-573389824 @jackylk @zzcclp optimized the comments. Please review it again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 commented on a change in pull request #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile
xubo245 commented on a change in pull request #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile URL: https://github.com/apache/carbondata/pull/3526#discussion_r365562824 ## File path: store/CSDK/test/main.cpp ## @@ -740,6 +740,130 @@ bool testWriteData(JNIEnv *env, char *path, int argc, char *argv[]) { } } +bool testWriteDataWithSchemaFile(JNIEnv *env, char *path, int argc, char *argv[]) { + +try { +CarbonWriter writer; +writer.builder(env); +writer.outputPath(path); +writer.withCsvInput(); + writer.withSchemaFile("../../../integration/spark-common/target/warehouse/add_segment_test/Metadata/schema"); +writer.writtenBy("CSDK"); +writer.taskNo(15541554.81); +writer.withThreadSafe(1); +writer.uniqueIdentifier(154991181400); +writer.withBlockSize(1); +writer.withBlockletSize(16); +writer.enableLocalDictionary(true); +writer.localDictionaryThreshold(1); +if (argc > 3) { +writer.withHadoopConf("fs.s3a.access.key", argv[1]); +writer.withHadoopConf("fs.s3a.secret.key", argv[2]); +writer.withHadoopConf("fs.s3a.endpoint", argv[3]); +} +writer.build(); + +int rowNum = 10; +int size = 14; +long longValue = 0; +double doubleValue = 0; +float floatValue = 0; +jclass objClass = env->FindClass("java/lang/String"); +for (int i = 0; i < rowNum; ++i) { +jobjectArray arr = env->NewObjectArray(size, objClass, 0); +char ctrInt[10]; +gcvt(i, 10, ctrInt); + +char a[15] = "robot"; +strcat(a, ctrInt); + + +jobject intField = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 0, intField); + +jobject stringField = env->NewStringUTF(a); +env->SetObjectArrayElement(arr, 1, stringField); + + +jobject string2Field = env->NewStringUTF(a); +env->SetObjectArrayElement(arr, 2, string2Field); + + +jobject timeField = env->NewStringUTF("2019-02-12 03:03:34"); +env->SetObjectArrayElement(arr, 3, timeField); + + +jobject int4Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 4, int4Field); + +jobject string5Field = env->NewStringUTF(a); +env->SetObjectArrayElement(arr, 5, string5Field); + +jobject int6Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 6, int6Field); + +jobject string7Field = env->NewStringUTF(a); +env->SetObjectArrayElement(arr, 7, string7Field); + +jobject int8Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 8, int8Field); + +jobject time9Field = env->NewStringUTF("2019-02-12 03:03:34"); +env->SetObjectArrayElement(arr, 9, time9Field); + +jobject dateField = env->NewStringUTF(" 2019-03-02"); +env->SetObjectArrayElement(arr, 10, dateField); + +jobject int11Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 11, int11Field); + +jobject int12Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 12, int12Field); + +jobject int13Field = env->NewStringUTF(ctrInt); +env->SetObjectArrayElement(arr, 13, int13Field); + +writer.write(arr); + +env->DeleteLocalRef(stringField); +env->DeleteLocalRef(string2Field); +env->DeleteLocalRef(intField); +env->DeleteLocalRef(int4Field); +env->DeleteLocalRef(string5Field); +env->DeleteLocalRef(int6Field); +env->DeleteLocalRef(dateField); +env->DeleteLocalRef(timeField); +env->DeleteLocalRef(string7Field); +env->DeleteLocalRef(int8Field); +env->DeleteLocalRef(int11Field); +env->DeleteLocalRef(int12Field); +env->DeleteLocalRef(int13Field); +env->DeleteLocalRef(arr); +} +writer.close(); + +CarbonReader carbonReader; +carbonReader.builder(env, path); +carbonReader.build(); +int i = 0; +int printNum = 10; +CarbonRow carbonRow(env); +while (carbonReader.hasNext()) { +jobject row = carbonReader.readNextRow(); +i++; +carbonRow.setCarbonRow(row); +if (i < printNum) { +printf("%s\t%d\t%ld\t", carbonRow.getString(1)); +} +env->DeleteLocalRef(row); +} +carbonReader.close(); +} catch (jthrowable ex) { +env->ExceptionDescribe(); +env->ExceptionClear(); +} Review comment: done This is an automated message from
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573377016 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1608/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573371968 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1607/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] zzcclp commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test
zzcclp commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test URL: https://github.com/apache/carbondata/pull/3575#issuecomment-573366714 Close this pr, spark-carbon-common-test will be removed in pr #3574 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] zzcclp closed pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test
zzcclp closed pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test URL: https://github.com/apache/carbondata/pull/3575 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] kunal642 opened a new pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
kunal642 opened a new pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474 **Problem:** Query on bloom datamap fails when there are multiple data files in one segment. **Solution:** Old pruned index files were cleared from the FilteredIndexSharedNames list. So further pruning was not done on all the valid index files. Hence added a check to clear the index files only in valid scenarios. Also handled the case where wrong blocklet id is passed while creating the blocklet from relative blocklet id. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] kunal642 closed pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
kunal642 closed pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474#discussion_r365541911 ## File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java ## @@ -145,18 +147,23 @@ public static DataMapJob getEmbeddedJob() { * Prune the segments from the already pruned blocklets. */ public static void pruneSegments(List segments, List prunedBlocklets) { -Set validSegments = new HashSet<>(); +Map> validSegments = new HashMap<>(); for (ExtendedBlocklet blocklet : prunedBlocklets) { - // Clear the old pruned index files if any present - blocklet.getSegment().getFilteredIndexShardNames().clear(); // Set the pruned index file to the segment // for further pruning. String shardName = CarbonTablePath.getShardName(blocklet.getFilePath()); - blocklet.getSegment().setFilteredIndexShardName(shardName); - validSegments.add(blocklet.getSegment()); + // Add the existing shards to corresponding segments + Set existingShards = new HashSet<>(); + existingShards = validSegments.putIfAbsent(blocklet.getSegment(), existingShards); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474#discussion_r365541908 ## File path: core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java ## @@ -145,18 +147,27 @@ public static DataMapJob getEmbeddedJob() { * Prune the segments from the already pruned blocklets. */ public static void pruneSegments(List segments, List prunedBlocklets) { -Set validSegments = new HashSet<>(); +Map> validSegments = new HashMap<>(); for (ExtendedBlocklet blocklet : prunedBlocklets) { - // Clear the old pruned index files if any present - blocklet.getSegment().getFilteredIndexShardNames().clear(); // Set the pruned index file to the segment // for further pruning. String shardName = CarbonTablePath.getShardName(blocklet.getFilePath()); - blocklet.getSegment().setFilteredIndexShardName(shardName); - validSegments.add(blocklet.getSegment()); + // Add the existing shards to corresponding segments + Set existingShards = validSegments.get(blocklet.getSegment()); + if (existingShards == null) { +existingShards = new HashSet<>(); +validSegments.put(blocklet.getSegment(), existingShards); + } else { +existingShards.add(shardName); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table URL: https://github.com/apache/carbondata/pull/3568#issuecomment-57504 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1605/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573331457 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1606/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test
CarbonDataQA1 commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test URL: https://github.com/apache/carbondata/pull/3575#issuecomment-573330635 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1603/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] zzcclp opened a new pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test
zzcclp opened a new pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test URL: https://github.com/apache/carbondata/pull/3575 upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573323098 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1600/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573322241 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1602/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573320963 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1601/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522833 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, getTableBlockIndexUniqueIdentifiers(segment); for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) { -tableBlockIndexUniqueIdentifierWrappers.add( -new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, -this.getCarbonTable())); +if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { + // add only tableBlockUniqueIdentifier that matches the partition + for (PartitionSpec partitionSpec : partitionsToPrune) { +if (partitionSpec.getLocation().toString() + .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) { Review comment: Actually, `tableBlockIndexUniqueIdentifier.getIndexFilePath()` is the index file parent path. I will add comments for the same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522109 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, getTableBlockIndexUniqueIdentifiers(segment); for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : identifiers) { -tableBlockIndexUniqueIdentifierWrappers.add( -new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, -this.getCarbonTable())); +if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { + // add only tableBlockUniqueIdentifier that matches the partition + for (PartitionSpec partitionSpec : partitionsToPrune) { +if (partitionSpec.getLocation().toString() + .equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) { Review comment: It is not easy to understand why comparing partition spec location with index file path, can you make it more readable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522014 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentPropertiesFetcher.java ## @@ -34,7 +35,7 @@ * @return * @throws IOException */ - SegmentProperties getSegmentProperties(Segment segment) + SegmentProperties getSegmentProperties(Segment segment, List partitionSpecs) Review comment: suggest to add one more interface instead of modifying existing one, keep both interface so that user can choose to use original one if no partition pruning is required. and please modify function description accordingly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table URL: https://github.com/apache/carbondata/pull/3568#discussion_r365521952 ## File path: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java ## @@ -81,19 +82,20 @@ public abstract DataMapBuilder createBuilder(Segment segment, String shardName, /** * Get the datamap for all segments */ - public Map> getDataMaps(List segments) - throws IOException { + public Map> getDataMaps(List segments, + List partitions) throws IOException { Map> dataMaps = new HashMap<>(); for (Segment segment : segments) { - dataMaps.put(segment, (List) this.getDataMaps(segment)); + dataMaps.put(segment, (List) this.getDataMaps(segment, partitions)); } return dataMaps; } /** * Get the datamap for segmentId */ - public abstract List getDataMaps(Segment segment) throws IOException; + public abstract List getDataMaps(Segment segment, List partitions) Review comment: suggest to add one more interface instead of modifying existing one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (CARBONDATA-3656) There are some conflicts when concurrently write data to S3
[ https://issues.apache.org/jira/browse/CARBONDATA-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-3656. -- Fix Version/s: 2.0.0 Resolution: Fixed > There are some conflicts when concurrently write data to S3 > --- > > Key: CARBONDATA-3656 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3656 > Project: CarbonData > Issue Type: Bug >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Fix For: 2.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are some conflicts when concurrently write data to S3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK
asfgit closed pull request #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK URL: https://github.com/apache/carbondata/pull/3567 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK
jackylk commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573317811 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3518: [DOC] add performance-tuning with codegen parameters support
asfgit closed pull request #3518: [DOC] add performance-tuning with codegen parameters support URL: https://github.com/apache/carbondata/pull/3518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3518: [DOC] add performance-tuning with codegen parameters support
jackylk commented on issue #3518: [DOC] add performance-tuning with codegen parameters support URL: https://github.com/apache/carbondata/pull/3518#issuecomment-573316089 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port. URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520784 ## File path: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ## @@ -198,9 +198,10 @@ private static BlockMetaInfo createBlockMetaInfo( Set tableBlockIndexUniqueIdentifiers = new HashSet<>(); Map indexFiles = segment.getCommittedIndexFile(); for (Map.Entry indexFileEntry : indexFiles.entrySet()) { - Path indexFile = new Path(indexFileEntry.getKey()); + String indexFile = indexFileEntry.getKey(); tableBlockIndexUniqueIdentifiers.add( - new TableBlockIndexUniqueIdentifier(indexFile.getParent().toString(), indexFile.getName(), + new TableBlockIndexUniqueIdentifier(indexFile.substring(0, indexFile.lastIndexOf("/")), Review comment: use FileNameUtils from Apache Commons This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port. URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520720 ## File path: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ## @@ -189,7 +189,7 @@ private static BlockMetaInfo createBlockMetaInfo( CarbonFile carbonFile = FileFactory.getCarbonFile(carbonDataFile); return new BlockMetaInfo(new String[] { "localhost" }, carbonFile.getSize()); default: -return fileNameToMetaInfoMapping.get(carbonDataFile); +return fileNameToMetaInfoMapping.get(FileFactory.getCarbonFile(carbonDataFile).getPath()); Review comment: please add comment why this is required? Can we add a case for Alluxio separately? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port. URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520617 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java ## @@ -137,12 +137,14 @@ public void init(DataMapModel dataMapModel) // if the segment data is written in tablepath then no need to store whole path of file. !blockletDataMapInfo.getFilePath().startsWith( blockletDataMapInfo.getCarbonTable().getTablePath())) { - filePath = path.getParent().toString().getBytes(CarbonCommonConstants.DEFAULT_CHARSET); + filePath = + path.substring(0, path.lastIndexOf("/")).getBytes(CarbonCommonConstants.DEFAULT_CHARSET); Review comment: I found that you can use `FilenameUtils.getFullPathNoEndSeparator(file)` from Apache Commons This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573310579 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1599/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573309170 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1598/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] QiangCai opened a new pull request #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions
QiangCai opened a new pull request #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions URL: https://github.com/apache/carbondata/pull/3574 ### Why is this PR needed? 1. not support mv 2. Parser still use CarbonAstBuidler 3. still use carbonsession to run some testcases ### What changes were proposed in this PR? 1. support mv 2. new order of parsers (CarbonParser->SparkParser) 3. remove spark-carbon-common-test module, move test back to spark-common-test module 4. ... ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573300811 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1597/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (CARBONDATA-3622) Error when table is dropped and is being accessed in Index Server afterwards
[ https://issues.apache.org/jira/browse/CARBONDATA-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3622. - Resolution: Fixed > Error when table is dropped and is being accessed in Index Server afterwards > > > Key: CARBONDATA-3622 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3622 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Mainak Bin >Priority: Major > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > If a table is dropped and is being accessed in Index Server afterwards while > clearing datamaps, it will lead to an error as that table is no more present. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3620) Update does not load cache in memory, behavior inconsistent with scenario when index server is not running
[ https://issues.apache.org/jira/browse/CARBONDATA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-3620. - Resolution: Fixed > Update does not load cache in memory, behavior inconsistent with scenario > when index server is not running > -- > > Key: CARBONDATA-3620 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3620 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Show metacache after update command returns: > SHOW METACACHE ON TABLE; > +-++-+-+--+ > | Field | Size | Comment | Cache Location | > +-++-+-+--+ > | Index | 0 B | 0/2 index files cached | DRIVER | > | Dictionary | 0 B | | DRIVER | > *| Index | 553 B | 1/2 index files cached | INDEX SERVER |* > | Dictionary | 0 B | | INDEX SERVER | > +-++-+-+--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not running
asfgit closed pull request #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not running URL: https://github.com/apache/carbondata/pull/3511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK
CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573298523 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1596/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK
Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573294141 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services