[GitHub] [carbondata] QiangCai commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


QiangCai commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651450852


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


QiangCai commented on a change in pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#discussion_r447373586



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -232,47 +233,40 @@
   private List getFilteredSegment(JobContext job, List 
validSegments,
   boolean validationRequired, ReadCommittedScope readCommittedScope) {
 Segment[] segmentsToAccess = getSegmentsToAccess(job, readCommittedScope);
-List segmentToAccessSet =
-new ArrayList<>(new HashSet<>(Arrays.asList(segmentsToAccess)));
-List filteredSegmentToAccess = new ArrayList<>();
 if (segmentsToAccess.length == 0 || 
segmentsToAccess[0].getSegmentNo().equalsIgnoreCase("*")) {
-  filteredSegmentToAccess.addAll(validSegments);
+  return validSegments;
 } else {

Review comment:
   remove this line

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -232,47 +233,40 @@
   private List getFilteredSegment(JobContext job, List 
validSegments,
   boolean validationRequired, ReadCommittedScope readCommittedScope) {
 Segment[] segmentsToAccess = getSegmentsToAccess(job, readCommittedScope);
-List segmentToAccessSet =
-new ArrayList<>(new HashSet<>(Arrays.asList(segmentsToAccess)));
-List filteredSegmentToAccess = new ArrayList<>();
 if (segmentsToAccess.length == 0 || 
segmentsToAccess[0].getSegmentNo().equalsIgnoreCase("*")) {
-  filteredSegmentToAccess.addAll(validSegments);
+  return validSegments;
 } else {
+  Map segmentToAccessMap = new 
HashMap<>(segmentsToAccess.length);
+  for (Segment segment : segmentsToAccess) {
+segmentToAccessMap.put(segment.getSegmentNo(), segment);
+  }
+  Map filteredSegmentToAccess = new HashMap<>();
   for (Segment validSegment : validSegments) {
-int index = segmentToAccessSet.indexOf(validSegment);
-if (index > -1) {
-  // In case of in progress reading segment, segment file name is set 
to the property itself
-  if (segmentToAccessSet.get(index).getSegmentFileName() != null
-  && validSegment.getSegmentFileName() == null) {
-filteredSegmentToAccess.add(segmentToAccessSet.get(index));
+String segmentNoOfValidSegment = validSegment.getSegmentNo();
+if (segmentToAccessMap.containsKey(segmentNoOfValidSegment)) {
+  if (segmentToAccessMap.get(segmentNoOfValidSegment)
+  .getSegmentFileName() != null && 
validSegment.getSegmentFileName() == null) {
+filteredSegmentToAccess.put(segmentNoOfValidSegment,
+segmentToAccessMap.get(segmentNoOfValidSegment));
   } else {
-filteredSegmentToAccess.add(validSegment);
-  }
-}
-  }
-  if (filteredSegmentToAccess.size() != segmentToAccessSet.size() && 
!validationRequired) {
-for (Segment segment : segmentToAccessSet) {
-  if (!filteredSegmentToAccess.contains(segment)) {
-filteredSegmentToAccess.add(segment);
+filteredSegmentToAccess.put(segmentNoOfValidSegment, validSegment);
   }

Review comment:
   String validSegmentNo = validSegment.getSegmentNo();
 Segment segmentToAccess = segmentToAccessMap.get(validSegmentNo);
 if (null != segmentToAccess) {
   if (validSegment.getSegmentFileName() != null ||
   segmentToAccess.getSegmentFileName() == null) {
 segmentToAccess = validSegment;
   }
   filteredSegmentToAccess.put(validSegmentNo, segmentToAccess);
 }

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -232,47 +233,40 @@
   private List getFilteredSegment(JobContext job, List 
validSegments,
   boolean validationRequired, ReadCommittedScope readCommittedScope) {
 Segment[] segmentsToAccess = getSegmentsToAccess(job, readCommittedScope);
-List segmentToAccessSet =
-new ArrayList<>(new HashSet<>(Arrays.asList(segmentsToAccess)));
-List filteredSegmentToAccess = new ArrayList<>();
 if (segmentsToAccess.length == 0 || 
segmentsToAccess[0].getSegmentNo().equalsIgnoreCase("*")) {
-  filteredSegmentToAccess.addAll(validSegments);
+  return validSegments;
 } else {
+  Map segmentToAccessMap = new 
HashMap<>(segmentsToAccess.length);
+  for (Segment segment : segmentsToAccess) {
+segmentToAccessMap.put(segment.getSegmentNo(), segment);
+  }
+  Map filteredSegmentToAccess = new HashMap<>();
   for (Segment validSegment : validSegments) {
-int index = segmentToAccessSet.indexOf(validSegment);
-if (index > -1) {
-  // In case of in progress reading segment, segment file name is set 
to the property itself
-  if (segmentToAccessSet.get(index).getSegmentFileName() != null
-

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651530149


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


asfgit closed pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] niuge01 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread GitBox


niuge01 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-651481639


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651538060


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1533/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651538493


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3269/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m

2020-06-29 Thread GitBox


ajantha-bhat commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r447427801



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {

Review comment:
   finalSegmentFile will always be null currently ? some code handling is 
missing or need to remove it ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread GitBox


asfgit closed pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m

2020-06-29 Thread GitBox


ajantha-bhat commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r447425474



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 throw new IOException(e);
   }
 }
-String segmentFileName = SegmentFileStore.genSegmentFileName(
-loadModel.getSegmentId(), 
String.valueOf(loadModel.getFactTimeStamp()));
 newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT);
-newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize());
+if (isMergeIndex) {

Review comment:
   can you move it in the else case of same method line 280, scattered 
checks are hard to read. It is better to keep together





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] niuge01 commented on a change in pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


niuge01 commented on a change in pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#discussion_r447366192



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -232,47 +233,40 @@
   private List getFilteredSegment(JobContext job, List 
validSegments,
   boolean validationRequired, ReadCommittedScope readCommittedScope) {
 Segment[] segmentsToAccess = getSegmentsToAccess(job, readCommittedScope);
-List segmentToAccessSet =
-new ArrayList<>(new HashSet<>(Arrays.asList(segmentsToAccess)));
-List filteredSegmentToAccess = new ArrayList<>();
+Map filteredSegmentToAccess = new HashMap<>();

Review comment:
   Move this line to line 244

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -232,47 +233,40 @@
   private List getFilteredSegment(JobContext job, List 
validSegments,
   boolean validationRequired, ReadCommittedScope readCommittedScope) {
 Segment[] segmentsToAccess = getSegmentsToAccess(job, readCommittedScope);
-List segmentToAccessSet =
-new ArrayList<>(new HashSet<>(Arrays.asList(segmentsToAccess)));
-List filteredSegmentToAccess = new ArrayList<>();
+Map filteredSegmentToAccess = new HashMap<>();
 if (segmentsToAccess.length == 0 || 
segmentsToAccess[0].getSegmentNo().equalsIgnoreCase("*")) {
-  filteredSegmentToAccess.addAll(validSegments);
+  return validSegments;
 } else {
+  Map segmentToAccessMap = new HashMap<>();

Review comment:
   should initialize map capacity with segmentsToAssess size.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651530514


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1532/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m

2020-06-29 Thread GitBox


ajantha-bhat commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r447426104



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala
##
@@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends 
QueryTest with BeforeAndAfte
 }
   }
 
+  override def afterEach(): Unit = {
+CarbonProperties.getInstance()

Review comment:
   no need for afterEach, in the same testcase where you are setting. you 
can reset





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651374421


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3267/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651375743


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1531/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


QiangCai commented on a change in pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#discussion_r446839075



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/merger/UnsafeInMemoryIntermediateDataMerger.java
##
@@ -141,25 +140,26 @@ private UnsafeCarbonRowForMerge 
getSortedRecordFromMemory() {
 // be based on comparator we are passing the heap
 // when will call poll it will always delete root of the tree and then
 // it does trickel down operation complexity is log(n)
-UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.poll();
+UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.peek();
 
 // get the row from chunk
 row = poll.getRow();
 
 // check if there no entry present
 if (!poll.hasNext()) {
+  this.recordHolderHeap.poll();

Review comment:
   to release resource, invoke this.recordHolderHeap.close()





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (CARBONDATA-3877) Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao closed CARBONDATA-3877.
---
Resolution: Fixed

> Reduce read tablestatus overhead during inserting into partition table
> --
>
> Key: CARBONDATA-3877
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3877
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.0.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently during inserting into a partition table, there are a lot of 
> tablestauts read operations, but when storing table status file in object 
> store, reading of table status file may fail (receive IOException or 
> JsonSyntaxException) when table status file is being modifying, which leading 
> to High failure rate when concurrent insert into a partition table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] QiangCai commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


QiangCai commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651010686


   how about to change RowResultMergerProcessor?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh commented on a change in pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


kevinjmh commented on a change in pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#discussion_r446854580



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/merger/UnsafeInMemoryIntermediateDataMerger.java
##
@@ -141,25 +140,26 @@ private UnsafeCarbonRowForMerge 
getSortedRecordFromMemory() {
 // be based on comparator we are passing the heap
 // when will call poll it will always delete root of the tree and then
 // it does trickel down operation complexity is log(n)
-UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.poll();
+UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.peek();
 
 // get the row from chunk
 row = poll.getRow();
 
 // check if there no entry present
 if (!poll.hasNext()) {
+  this.recordHolderHeap.poll();

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [WIP] Fix missing Table status lock in some SI flows

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#issuecomment-651041002


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1517/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3813: [CARBONDATA-3876] Update cli test case

2020-06-29 Thread GitBox


asfgit closed pull request #3813:
URL: https://github.com/apache/carbondata/pull/3813


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


kevinjmh commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651002898


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai edited a comment on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


QiangCai edited a comment on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651010686


   can you change RowResultMergerProcessor?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #1380: [CARBONDATA-1485] Timestamp no dictionary bug

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #1380:
URL: https://github.com/apache/carbondata/pull/1380#issuecomment-651065141


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3253/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651139374


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3254/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


QiangCai commented on a change in pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#discussion_r446910728



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java
##
@@ -126,7 +125,7 @@ public boolean execute(List 
unsortedResultIteratorList,
   RawResultIterator iterator = null;
   while (index > 1) {
 // iterator the top record
-iterator = this.recordHolderHeap.poll();
+iterator = this.recordHolderHeap.peek();
 Object[] convertedRow = iterator.next();
 if (null == convertedRow) {
   index--;

Review comment:
   insert code: this.recordHolderHeap.poll();





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651123569


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1519/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651126279


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3249/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651144733


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3251/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [WIP] Fix missing Table status lock in some SI flows

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#issuecomment-651042182


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh commented on a change in pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


kevinjmh commented on a change in pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#discussion_r446924645



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/RowResultMergerProcessor.java
##
@@ -126,7 +125,7 @@ public boolean execute(List 
unsortedResultIteratorList,
   RawResultIterator iterator = null;
   while (index > 1) {
 // iterator the top record
-iterator = this.recordHolderHeap.poll();
+iterator = this.recordHolderHeap.peek();
 Object[] convertedRow = iterator.next();
 if (null == convertedRow) {
   index--;

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651078587


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1518/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651137417


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1520/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651147884


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3256/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 opened a new pull request #3815: support carbon SDK to load data from different files

2020-06-29 Thread GitBox


nihal0107 opened a new pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815


   ### Why is this PR needed?
TO support carbon SDK to load data from different files.

   ### What changes were proposed in this PR?
   
   Now user can load a single file or all files under a given directory or 
selected files under given directory.
   
   Please go through the design document for better understanding-
   https://issues.apache.org/jira/browse/CARBONDATA-3855
   
   ### Does this PR introduce any user interface change?
- No
   
   ### Is any new test case added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651142751


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3252/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread GitBox


marchpure commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-651146016


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651162149


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1521/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-651227739


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1522/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651186407


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3259/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3816: [CARBONDATA-3879] Filtering Segmets Optimazation

2020-06-29 Thread GitBox


marchpure opened a new pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816


### Why is this PR needed?
During filter segments flow, there are a lot of LIST.CONTAINS, which has 
heavy time overhead when there are tens of thousands segments.
For example, if there are 5 segments. it will trigger LIST.CONTAINS  
for each segment, the LIST also has about 5 elements. so the time 
complexity will be O(5 * 5 )

### What changes were proposed in this PR?
   Change List.CONTAINS to MAP.containsKEY
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3815:
URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651223370


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1523/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3879) Filtering Segmets Optimazation

2020-06-29 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3879:
---

 Summary: Filtering Segmets Optimazation
 Key: CARBONDATA-3879
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3879
 Project: CarbonData
  Issue Type: Improvement
  Components: data-query
Affects Versions: 2.0.0
Reporter: Xingjun Hao
 Fix For: 2.0.2


During filter segments flow, there are a lot of LIST.CONTAINS, which has heavy 
time overhead when there are tens of thousands segments.

For example, if there are 5 segments. it will trigger LIST.CONTAINS  for 
each segment, the LIST also has about 5 elements. so the time complexity 
will be O(5 * 5 )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-651160152


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3257/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-651225029


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3258/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651264615


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3261/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651267813


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1526/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651273377


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3265/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651266518


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3262/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-06-29 Thread GitBox


ShreelekhyaG opened a new pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817


   …y BUCKET_NUMBER and BUCKET_COLUMNS
   
### Why is this PR needed?
Bucket table creation fails with an exception for empty BUCKET_NUMBER and 
BUCKET_COLUMNS

### What changes were proposed in this PR?
   wrapped BUCKET_NUMBER to Int conversion with Try, to prevent 
NumberFormatException for empty/other string values.
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651275203


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1529/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651265133


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1525/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651270444


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1528/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-651270919


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3264/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-651318156


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1527/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-651318828


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3263/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation

2020-06-29 Thread GitBox


CarbonDataQA1 commented on pull request #3816:
URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651350898







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org