[GitHub] [carbondata] xubo245 edited a comment on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 edited a comment on issue #3479: [ CARBONDATA-3271] Integrating deep 
learning framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#issuecomment-573391580
 
 
   optimized, please review it again.  @jackylk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on issue #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 commented on issue #3479: [ CARBONDATA-3271] Integrating deep learning 
framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#issuecomment-573391580
 
 
   optimized, please review it again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] 
Integrating deep learning framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#discussion_r365564279
 
 

 ##
 File path: python/pycarbon/etl/carbon_dataset_metadata.py
 ##
 @@ -0,0 +1,235 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
 
 Review comment:
   already move to core folder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] 
Integrating deep learning framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#discussion_r365564219
 
 

 ##
 File path: python/pycarbon/integration/tensorflow.py
 ##
 @@ -0,0 +1,358 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+"""A set of Tensorflow specific helper functions for the unischema"""
 
 Review comment:
   done.  Mnist: tf_external_example_carbon_unified_api.py


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] 
Integrating deep learning framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#discussion_r365563240
 
 

 ##
 File path: python/pycarbon/reader.py
 ##
 @@ -0,0 +1,202 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from obs import ObsClient
 
 Review comment:
   Yes. Didn't test on S3 and didn't test the Compatible between S3 and OBS. We 
can do it in the next PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] Integrating deep learning framework TensorFlow

2020-01-11 Thread GitBox
xubo245 commented on a change in pull request #3479: [ CARBONDATA-3271] 
Integrating deep learning framework TensorFlow
URL: https://github.com/apache/carbondata/pull/3479#discussion_r365562983
 
 

 ##
 File path: python/pycarbon/README.md
 ##
 @@ -0,0 +1,53 @@
+# PyCarbon
+
+Optimized data access for AI based on CarbonData files, and we can use 
PyCarbon lib to read CarbonData, also prepare training data for different 
computing framework, e.g. TensorFlow, PyTorch, MXNet. 
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on issue #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile

2020-01-11 Thread GitBox
xubo245 commented on issue #3526: [CARBONDATA-3627] C++ SDK support write data 
withSchemaFile
URL: https://github.com/apache/carbondata/pull/3526#issuecomment-573389824
 
 
   @jackylk @zzcclp optimized the comments. Please review it again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 commented on a change in pull request #3526: [CARBONDATA-3627] C++ SDK support write data withSchemaFile

2020-01-11 Thread GitBox
xubo245 commented on a change in pull request #3526: [CARBONDATA-3627] C++ SDK 
support write data withSchemaFile
URL: https://github.com/apache/carbondata/pull/3526#discussion_r365562824
 
 

 ##
 File path: store/CSDK/test/main.cpp
 ##
 @@ -740,6 +740,130 @@ bool testWriteData(JNIEnv *env, char *path, int argc, 
char *argv[]) {
 }
 }
 
+bool testWriteDataWithSchemaFile(JNIEnv *env, char *path, int argc, char 
*argv[]) {
+
+try {
+CarbonWriter writer;
+writer.builder(env);
+writer.outputPath(path);
+writer.withCsvInput();
+
writer.withSchemaFile("../../../integration/spark-common/target/warehouse/add_segment_test/Metadata/schema");
+writer.writtenBy("CSDK");
+writer.taskNo(15541554.81);
+writer.withThreadSafe(1);
+writer.uniqueIdentifier(154991181400);
+writer.withBlockSize(1);
+writer.withBlockletSize(16);
+writer.enableLocalDictionary(true);
+writer.localDictionaryThreshold(1);
+if (argc > 3) {
+writer.withHadoopConf("fs.s3a.access.key", argv[1]);
+writer.withHadoopConf("fs.s3a.secret.key", argv[2]);
+writer.withHadoopConf("fs.s3a.endpoint", argv[3]);
+}
+writer.build();
+
+int rowNum = 10;
+int size = 14;
+long longValue = 0;
+double doubleValue = 0;
+float floatValue = 0;
+jclass objClass = env->FindClass("java/lang/String");
+for (int i = 0; i < rowNum; ++i) {
+jobjectArray arr = env->NewObjectArray(size, objClass, 0);
+char ctrInt[10];
+gcvt(i, 10, ctrInt);
+
+char a[15] = "robot";
+strcat(a, ctrInt);
+
+
+jobject intField = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 0, intField);
+
+jobject stringField = env->NewStringUTF(a);
+env->SetObjectArrayElement(arr, 1, stringField);
+
+
+jobject string2Field = env->NewStringUTF(a);
+env->SetObjectArrayElement(arr, 2, string2Field);
+
+
+jobject timeField = env->NewStringUTF("2019-02-12 03:03:34");
+env->SetObjectArrayElement(arr, 3, timeField);
+
+
+jobject int4Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 4, int4Field);
+
+jobject string5Field = env->NewStringUTF(a);
+env->SetObjectArrayElement(arr, 5, string5Field);
+
+jobject int6Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 6, int6Field);
+
+jobject string7Field = env->NewStringUTF(a);
+env->SetObjectArrayElement(arr, 7, string7Field);
+
+jobject int8Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 8, int8Field);
+
+jobject time9Field = env->NewStringUTF("2019-02-12 03:03:34");
+env->SetObjectArrayElement(arr, 9, time9Field);
+
+jobject dateField = env->NewStringUTF(" 2019-03-02");
+env->SetObjectArrayElement(arr, 10, dateField);
+
+jobject int11Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 11, int11Field);
+
+jobject int12Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 12, int12Field);
+
+jobject int13Field = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 13, int13Field);
+
+writer.write(arr);
+
+env->DeleteLocalRef(stringField);
+env->DeleteLocalRef(string2Field);
+env->DeleteLocalRef(intField);
+env->DeleteLocalRef(int4Field);
+env->DeleteLocalRef(string5Field);
+env->DeleteLocalRef(int6Field);
+env->DeleteLocalRef(dateField);
+env->DeleteLocalRef(timeField);
+env->DeleteLocalRef(string7Field);
+env->DeleteLocalRef(int8Field);
+env->DeleteLocalRef(int11Field);
+env->DeleteLocalRef(int12Field);
+env->DeleteLocalRef(int13Field);
+env->DeleteLocalRef(arr);
+}
+writer.close();
+
+CarbonReader carbonReader;
+carbonReader.builder(env, path);
+carbonReader.build();
+int i = 0;
+int printNum = 10;
+CarbonRow carbonRow(env);
+while (carbonReader.hasNext()) {
+jobject row = carbonReader.readNextRow();
+i++;
+carbonRow.setCarbonRow(row);
+if (i < printNum) {
+printf("%s\t%d\t%ld\t", carbonRow.getString(1));
+}
+env->DeleteLocalRef(row);
+}
+carbonReader.close();
+} catch (jthrowable ex) {
+env->ExceptionDescribe();
+env->ExceptionClear();
+}
 
 Review comment:
   done


This is an automated message from 

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon 
SparkExtensions
URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573377016
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1608/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573371968
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1607/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] zzcclp commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test

2020-01-11 Thread GitBox
zzcclp commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module 
spark-carbon-common-test
URL: https://github.com/apache/carbondata/pull/3575#issuecomment-573366714
 
 
   Close this pr, spark-carbon-common-test will be removed in pr #3574 .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] zzcclp closed pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test

2020-01-11 Thread GitBox
zzcclp closed pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module 
spark-carbon-common-test
URL: https://github.com/apache/carbondata/pull/3575
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 opened a new pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2020-01-11 Thread GitBox
kunal642 opened a new pull request #3474: [CARBONDATA-3592] Fix query on bloom 
in case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474
 
 
   **Problem:** Query on bloom datamap fails when there are multiple data files 
in one segment.
   **Solution:** Old pruned index files were cleared from the 
FilteredIndexSharedNames list. So further pruning was not done on all the valid 
index files. Hence added a check to clear the index files only in valid 
scenarios. Also handled the case where wrong blocklet id is passed while 
creating the blocklet from relative blocklet id.
   
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 closed pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2020-01-11 Thread GitBox
kunal642 closed pull request #3474: [CARBONDATA-3592] Fix query on bloom in 
case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2020-01-11 Thread GitBox
kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix 
query on bloom in case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474#discussion_r365541911
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java
 ##
 @@ -145,18 +147,23 @@ public static DataMapJob getEmbeddedJob() {
* Prune the segments from the already pruned blocklets.
*/
   public static void pruneSegments(List segments, 
List prunedBlocklets) {
-Set validSegments = new HashSet<>();
+Map> validSegments = new HashMap<>();
 for (ExtendedBlocklet blocklet : prunedBlocklets) {
-  // Clear the old pruned index files if any present
-  blocklet.getSegment().getFilteredIndexShardNames().clear();
   // Set the pruned index file to the segment
   // for further pruning.
   String shardName = CarbonTablePath.getShardName(blocklet.getFilePath());
-  blocklet.getSegment().setFilteredIndexShardName(shardName);
-  validSegments.add(blocklet.getSegment());
+  // Add the existing shards to corresponding segments
+  Set existingShards = new HashSet<>();
+  existingShards = validSegments.putIfAbsent(blocklet.getSegment(), 
existingShards);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2020-01-11 Thread GitBox
kunal642 commented on a change in pull request #3474: [CARBONDATA-3592] Fix 
query on bloom in case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474#discussion_r365541908
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java
 ##
 @@ -145,18 +147,27 @@ public static DataMapJob getEmbeddedJob() {
* Prune the segments from the already pruned blocklets.
*/
   public static void pruneSegments(List segments, 
List prunedBlocklets) {
-Set validSegments = new HashSet<>();
+Map> validSegments = new HashMap<>();
 for (ExtendedBlocklet blocklet : prunedBlocklets) {
-  // Clear the old pruned index files if any present
-  blocklet.getSegment().getFilteredIndexShardNames().clear();
   // Set the pruned index file to the segment
   // for further pruning.
   String shardName = CarbonTablePath.getShardName(blocklet.getFilePath());
-  blocklet.getSegment().setFilteredIndexShardName(shardName);
-  validSegments.add(blocklet.getSegment());
+  // Add the existing shards to corresponding segments
+  Set existingShards = validSegments.get(blocklet.getSegment());
+  if (existingShards == null) {
+existingShards = new HashSet<>();
+validSegments.put(blocklet.getSegment(), existingShards);
+  } else {
+existingShards.add(shardName);
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only 
Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-57504
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1605/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon 
SparkExtensions
URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573331457
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1606/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for 
module spark-carbon-common-test
URL: https://github.com/apache/carbondata/pull/3575#issuecomment-573330635
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1603/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] zzcclp opened a new pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test

2020-01-11 Thread GitBox
zzcclp opened a new pull request #3575: [HOTFIX] upgrade jdk 1.7 to 1.8 for 
module spark-carbon-common-test
URL: https://github.com/apache/carbondata/pull/3575
 
 
   upgrade jdk 1.7 to 1.8 for module spark-carbon-common-test
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573323098
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1600/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573322241
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1602/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573320963
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1601/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-11 Thread GitBox
Indhumathi27 commented on a change in pull request #3568: [CARBONDATA-3658] 
Prune and Cache only Matched partitioned segments for filter on Partitioned 
table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522833
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
   getTableBlockIndexUniqueIdentifiers(segment);
 
   for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : 
identifiers) {
-tableBlockIndexUniqueIdentifierWrappers.add(
-new 
TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier,
-this.getCarbonTable()));
+if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) {
+  // add only tableBlockUniqueIdentifier that matches the partition
+  for (PartitionSpec partitionSpec : partitionsToPrune) {
+if (partitionSpec.getLocation().toString()
+
.equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
 
 Review comment:
   Actually, `tableBlockIndexUniqueIdentifier.getIndexFilePath()` is the index 
file parent path. I will add comments for the same


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune 
and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522109
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -136,9 +136,22 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
   getTableBlockIndexUniqueIdentifiers(segment);
 
   for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : 
identifiers) {
-tableBlockIndexUniqueIdentifierWrappers.add(
-new 
TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier,
-this.getCarbonTable()));
+if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) {
+  // add only tableBlockUniqueIdentifier that matches the partition
+  for (PartitionSpec partitionSpec : partitionsToPrune) {
+if (partitionSpec.getLocation().toString()
+
.equalsIgnoreCase(tableBlockIndexUniqueIdentifier.getIndexFilePath())) {
 
 Review comment:
   It is not easy to understand why comparing partition spec location with 
index file path, can you make it more readable


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune 
and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365522014
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/SegmentPropertiesFetcher.java
 ##
 @@ -34,7 +35,7 @@
* @return
* @throws IOException
*/
-  SegmentProperties getSegmentProperties(Segment segment)
+  SegmentProperties getSegmentProperties(Segment segment, List 
partitionSpecs)
 
 Review comment:
   suggest to add one more interface instead of modifying existing one, keep 
both interface so that user can choose to use original one if no partition 
pruning is required.
   and please modify function description accordingly


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3568: [CARBONDATA-3658] Prune 
and Cache only Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#discussion_r365521952
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java
 ##
 @@ -81,19 +82,20 @@ public abstract DataMapBuilder createBuilder(Segment 
segment, String shardName,
   /**
* Get the datamap for all segments
*/
-  public Map> getDataMaps(List 
segments)
-  throws IOException {
+  public Map> getDataMaps(List 
segments,
+  List partitions) throws IOException {
 Map> dataMaps = new HashMap<>();
 for (Segment segment : segments) {
-  dataMaps.put(segment, (List) 
this.getDataMaps(segment));
+  dataMaps.put(segment, (List) 
this.getDataMaps(segment, partitions));
 }
 return dataMaps;
   }
 
   /**
* Get the datamap for segmentId
*/
-  public abstract List getDataMaps(Segment segment) throws IOException;
+  public abstract List getDataMaps(Segment segment, List 
partitions)
 
 Review comment:
   suggest to add one more interface instead of modifying existing one


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3656) There are some conflicts when concurrently write data to S3

2020-01-11 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3656.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

> There are some conflicts when concurrently write data to S3
> ---
>
> Key: CARBONDATA-3656
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3656
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
> Fix For: 2.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are some conflicts when concurrently write data to S3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-11 Thread GitBox
asfgit closed pull request #3567: [CARBONDATA-3656] set Default TaskNo To Avoid 
Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-11 Thread GitBox
jackylk commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid 
Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573317811
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3518: [DOC] add performance-tuning with codegen parameters support

2020-01-11 Thread GitBox
asfgit closed pull request #3518: [DOC] add performance-tuning with codegen 
parameters support
URL: https://github.com/apache/carbondata/pull/3518
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3518: [DOC] add performance-tuning with codegen parameters support

2020-01-11 Thread GitBox
jackylk commented on issue #3518: [DOC] add performance-tuning with codegen 
parameters support
URL: https://github.com/apache/carbondata/pull/3518#issuecomment-573316089
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix 
issues with alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520784
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
 ##
 @@ -198,9 +198,10 @@ private static BlockMetaInfo createBlockMetaInfo(
 Set tableBlockIndexUniqueIdentifiers = 
new HashSet<>();
 Map indexFiles = segment.getCommittedIndexFile();
 for (Map.Entry indexFileEntry : indexFiles.entrySet()) {
-  Path indexFile = new Path(indexFileEntry.getKey());
+  String indexFile = indexFileEntry.getKey();
   tableBlockIndexUniqueIdentifiers.add(
-  new 
TableBlockIndexUniqueIdentifier(indexFile.getParent().toString(), 
indexFile.getName(),
+  new TableBlockIndexUniqueIdentifier(indexFile.substring(0, 
indexFile.lastIndexOf("/")),
 
 Review comment:
   use FileNameUtils from Apache Commons


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix 
issues with alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520720
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
 ##
 @@ -189,7 +189,7 @@ private static BlockMetaInfo createBlockMetaInfo(
 CarbonFile carbonFile = FileFactory.getCarbonFile(carbonDataFile);
 return new BlockMetaInfo(new String[] { "localhost" }, 
carbonFile.getSize());
   default:
-return fileNameToMetaInfoMapping.get(carbonDataFile);
+return 
fileNameToMetaInfoMapping.get(FileFactory.getCarbonFile(carbonDataFile).getPath());
 
 Review comment:
   please add comment why this is required? Can we add a case for Alluxio 
separately?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-11 Thread GitBox
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix 
issues with alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571#discussion_r365520617
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
 ##
 @@ -137,12 +137,14 @@ public void init(DataMapModel dataMapModel)
 // if the segment data is written in tablepath then no need to store 
whole path of file.
 !blockletDataMapInfo.getFilePath().startsWith(
 blockletDataMapInfo.getCarbonTable().getTablePath())) {
-  filePath = 
path.getParent().toString().getBytes(CarbonCommonConstants.DEFAULT_CHARSET);
+  filePath =
+  path.substring(0, 
path.lastIndexOf("/")).getBytes(CarbonCommonConstants.DEFAULT_CHARSET);
 
 Review comment:
   I found that you can use `FilenameUtils.getFullPathNoEndSeparator(file)` 
from Apache Commons
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-573310579
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1599/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3574: [CARBONDATA-3503] Optimize Carbon 
SparkExtensions
URL: https://github.com/apache/carbondata/pull/3574#issuecomment-573309170
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1598/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai opened a new pull request #3574: [CARBONDATA-3503] Optimize Carbon SparkExtensions

2020-01-11 Thread GitBox
QiangCai opened a new pull request #3574: [CARBONDATA-3503] Optimize Carbon 
SparkExtensions
URL: https://github.com/apache/carbondata/pull/3574
 
 
### Why is this PR needed?
   1. not support mv
   2. Parser still use CarbonAstBuidler
   3. still use carbonsession to run some testcases

### What changes were proposed in this PR?
   1. support mv
   2.  new order of parsers (CarbonParser->SparkParser)
   3.  remove spark-carbon-common-test module, move test back to 
spark-common-test module
   4. ...
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-573300811
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1597/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3622) Error when table is dropped and is being accessed in Index Server afterwards

2020-01-11 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-3622.
-
Resolution: Fixed

> Error when table is dropped and is being accessed in Index Server afterwards
> 
>
> Key: CARBONDATA-3622
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3622
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Mainak Bin
>Priority: Major
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If a table is dropped and is being accessed in Index Server afterwards while 
> clearing datamaps, it will lead to an error as that table is no more present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3620) Update does not load cache in memory, behavior inconsistent with scenario when index server is not running

2020-01-11 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-3620.
-
Resolution: Fixed

> Update does not load cache in memory, behavior inconsistent with scenario 
> when index server is not running
> --
>
> Key: CARBONDATA-3620
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3620
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Show metacache after update command returns:
>  SHOW METACACHE ON TABLE;
> +-++-+-+--+
> |    Field    |  Size  |         Comment         | Cache Location  |
> +-++-+-+--+
> | Index       | 0 B    | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B    |                         | DRIVER          |
> *| Index       | 553 B  | 1/2 index files cached  | INDEX SERVER    |*
> | Dictionary  | 0 B    |                         | INDEX SERVER    |
> +-++-+-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not running

2020-01-11 Thread GitBox
asfgit closed pull request #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update 
does not load cache in memory, behavior inconsistent with scenario when index 
server is not running
URL: https://github.com/apache/carbondata/pull/3511
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-11 Thread GitBox
CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To 
Avoid Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573298523
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1596/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-11 Thread GitBox
Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To 
Avoid Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567#issuecomment-573294141
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services