[jira] [Resolved] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process

2020-07-27 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3924.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Should add default dynamic parameters only one time in one JVM process
> --
>
> Key: CARBONDATA-3924
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3924
> Project: CarbonData
>  Issue Type: Bug
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Because ConfigEntry.registerEntry method cann't register same entry one 
> times, so it should add default dynamic parameters only one time in one JVM 
> process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3860:
URL: https://github.com/apache/carbondata/pull/3860#issuecomment-664293945


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1761/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3860:
URL: https://github.com/apache/carbondata/pull/3860#issuecomment-664286181


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3503/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [CARBONDATA-3927]Remove compressor name from tupleID to make it short to improve store size and performance.

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664283269


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1759/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r460795333



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.secondaryindex
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterEach
+
+import 
org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI
+
+class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach {
+
+  override def beforeEach(): Unit = {
+sql("drop table if exists complextable")
+  }
+
+  override def afterEach(): Unit = {
+sql("drop index if exists index_1 on complextable")
+sql("drop table if exists complextable")
+  }
+
+  test("test array on secondary index") {

Review comment:
   d) If two array_contains() present with AND in query. When it is pushed 
down as equal to filter in SI. It will give 0 rows as SI is flattened and it 
cannot find two values in one row. Need to handle that also





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process

2020-07-27 Thread GitBox


ajantha-bhat commented on pull request #3863:
URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664257717


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3863:
URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664251374


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3500/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3927) TupleID/Position reference is long , make it short

2020-07-27 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-3927:

Issue Type: Improvement  (was: Bug)

> TupleID/Position reference is long , make it short
> --
>
> Key: CARBONDATA-3927
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3927
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
>
> the current tuple id is long where some parts we can avoid to improve 
> performance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3929) Improve the CDC merge feature time

2020-07-27 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3929:
---

 Summary: Improve the CDC merge feature time
 Key: CARBONDATA-3929
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3929
 Project: CarbonData
  Issue Type: Improvement
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


Improve the CDC merge feature time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r46076



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java
##
@@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) {
 UnsafeSortDataRows[] sortDataRows = new 
UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()];
 intermediateFileMergers = new 
UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()];
 SortParameters[] sortParameterArray = new 
SortParameters[columnRangeInfo.getNumOfRanges()];
+this.writeService = 
Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(),

Review comment:
   @kevinjmh : Yes, If cores are available, adding threads horizontally can 
speedup not just sort, but other steps in data loading also.
   If cores are not available, adding threads vertically also no use as they 
will end up waiting for cpu. 
   
   so, I felt. This PR changes not required and user can increase 
`carbon.number.of.cores.while.loading`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty

2020-07-27 Thread GitBox


marchpure commented on pull request #3864:
URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664245504


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3863:
URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664244778


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1758/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3928) Handle the Strings which length is greater than 32000 as a bad record.

2020-07-27 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3928:


 Summary: Handle the Strings which length is greater than 32000 as 
a bad record.
 Key: CARBONDATA-3928
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3928
 Project: CarbonData
  Issue Type: Task
Reporter: Nihal kumar ojha


Currently, when the string length exceeds 32000 then the load is failed.
Suggestion:
1. Bad record can handle string length greater than 32000 and load should not 
be failed because only a few records string length is greater than 32000.
2. Include some more information in the log message like which record and 
column have the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3927) TupleID/Position reference is long , make it short

2020-07-27 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3927:
---

 Summary: TupleID/Position reference is long , make it short
 Key: CARBONDATA-3927
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3927
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


the current tuple id is long where some parts we can avoid to improve 
performance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664239278


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3499/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r46076



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java
##
@@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) {
 UnsafeSortDataRows[] sortDataRows = new 
UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()];
 intermediateFileMergers = new 
UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()];
 SortParameters[] sortParameterArray = new 
SortParameters[columnRangeInfo.getNumOfRanges()];
+this.writeService = 
Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(),

Review comment:
   @kevinjmh : Yes, If cores are available adding threads horizontally can 
speedup not just sort, but other steps in data loading also.
   If cores are not available, adding threads vertically also no use as they 
will end up waiting for cpu. 
   
   so, I felt. This PR changes not required and user can increase 
`carbon.number.of.cores.while.loading`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664237649


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1757/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory

2020-07-27 Thread yutao (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165566#comment-17165566
 ] 

yutao commented on CARBONDATA-3926:
---

but i think  it can  be a hdfs directory 

> flink-integration i find it can't move file to stage_data directory 
> 
>
> Key: CARBONDATA-3926
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3926
> Project: CarbonData
>  Issue Type: Bug
>  Components: flink-integration
>Affects Versions: 2.0.0, 2.0.1
> Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink 
> 1.10.1,hive 1.1.0
>Reporter: yutao
>Priority: Critical
> Fix For: 2.1.0
>
>
> [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md]
>  i work with this ,use spark sql create carbondata table and i can see 
>  -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 
> hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema
> then i write flink app and run with yarn;
> it work i can see carbonfile in my code defined directory ;
> val dataTempPath = "hdfs://beh/user/dc_cbss/temp/"
> [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/
>  Found 10 items
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 
> hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 
> hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
> hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 
> hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 
> hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 
> hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 
> hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 
> hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
> hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486
>  [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/
>  Found 8 items
>  -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata
>  
> but there no stage_data directory and data not mv to stage_data when flink 
> app commit;
> i debug code find in CarbonWriter.java file find  this method influence it ;
> protected StageInput uploadSegmentDataFiles(final String localPath, final 
> String remotePath) {
> if (!this.table.isHivePartitionTable()) {
>  final *{color:#ff}File[] files = new File(localPath).listFiles();{color}*
>  if (files == null)
> { LOGGER.error("files is null" ); return null; }
> Map fileNameMapLength = new HashMap<>(files.length);
>  for (File file : files) {
>  fileNameMapLength.put(file.getName(), file.length());
>  if (LOGGER.isDebugEnabled())
> { LOGGER.debug( "Upload file[" + 

[jira] [Closed] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory

2020-07-27 Thread yutao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yutao closed CARBONDATA-3926.
-
Resolution: Not A Bug

the temp directory is a local directory,not allow hdfs directory

> flink-integration i find it can't move file to stage_data directory 
> 
>
> Key: CARBONDATA-3926
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3926
> Project: CarbonData
>  Issue Type: Bug
>  Components: flink-integration
>Affects Versions: 2.0.0, 2.0.1
> Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink 
> 1.10.1,hive 1.1.0
>Reporter: yutao
>Priority: Critical
> Fix For: 2.1.0
>
>
> [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md]
>  i work with this ,use spark sql create carbondata table and i can see 
>  -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 
> hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema
> then i write flink app and run with yarn;
> it work i can see carbonfile in my code defined directory ;
> val dataTempPath = "hdfs://beh/user/dc_cbss/temp/"
> [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/
>  Found 10 items
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 
> hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 
> hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
> hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 
> hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 
> hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 
> hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 
> hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 
> hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278
>  drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
> hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486
>  [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/
>  Found 8 items
>  -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex
>  -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata
>  -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 
> hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata
>  
> but there no stage_data directory and data not mv to stage_data when flink 
> app commit;
> i debug code find in CarbonWriter.java file find  this method influence it ;
> protected StageInput uploadSegmentDataFiles(final String localPath, final 
> String remotePath) {
> if (!this.table.isHivePartitionTable()) {
>  final *{color:#ff}File[] files = new File(localPath).listFiles();{color}*
>  if (files == null)
> { LOGGER.error("files is null" ); return null; }
> Map fileNameMapLength = new HashMap<>(files.length);
>  for (File file : files) {
>  fileNameMapLength.put(file.getName(), file.length());
>  if (LOGGER.isDebugEnabled())
> { LOGGER.debug( "Upload file[" + 

[GitHub] [carbondata] kevinjmh commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


kevinjmh commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r460750438



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java
##
@@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) {
 UnsafeSortDataRows[] sortDataRows = new 
UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()];
 intermediateFileMergers = new 
UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()];
 SortParameters[] sortParameterArray = new 
SortParameters[columnRangeInfo.getNumOfRanges()];
+this.writeService = 
Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(),

Review comment:
   @ajantha-bhat Good point. So the only difference is adding threads 
horizontally or vertically.  If each thread takes same time to process the data 
and writes at same time, performance may degrade caused by IO preemption. But 
the different may not big when number of input split is large enough. @shunlean 
could you please do some test to confirm ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shunlean commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


shunlean commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r460741367



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
##
@@ -200,25 +203,44 @@ public void startSorting() {
* @param file file
* @throws CarbonSortKeyAndGroupByException
*/
-  private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file)
-  throws CarbonSortKeyAndGroupByException {
-DataOutputStream stream = null;
-try {
-  // open stream
-  stream = FileFactory.getDataOutputStream(file.getPath(),
-  parameters.getFileWriteBufferSize(), 
parameters.getSortTempCompressorName());
-  int actualSize = rowPage.getBuffer().getActualSize();
-  // write number of entries to the file
-  stream.writeInt(actualSize);
-  for (int i = 0; i < actualSize; i++) {
-rowPage.writeRow(
-rowPage.getBuffer().get(i) + 
rowPage.getDataBlock().getBaseOffset(), stream);
+  private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) {
+writeService.submit(new WriteThread(rowPage, file));
+  }
+
+  public class WriteThread implements Runnable {
+private File file;
+private UnsafeCarbonRowPage rowPage;
+
+public WriteThread(UnsafeCarbonRowPage rowPage, File file) {
+  this.rowPage = rowPage;
+  this.file = file;
+
+}
+
+@Override
+public void run() {
+  DataOutputStream stream = null;
+  try {
+// open stream
+stream = FileFactory.getDataOutputStream(this.file.getPath(),
+parameters.getFileWriteBufferSize(), 
parameters.getSortTempCompressorName());
+int actualSize = rowPage.getBuffer().getActualSize();
+// write number of entries to the file
+stream.writeInt(actualSize);
+for (int i = 0; i < actualSize; i++) {
+  rowPage.writeRow(
+  rowPage.getBuffer().get(i) + 
rowPage.getDataBlock().getBaseOffset(), stream);
+}
+// add sort temp filename to and arrayList. When the list size reaches 
20 then
+// intermediate merging of sort temp files will be triggered
+unsafeInMemoryIntermediateFileMerger.addFileToMerge(file);
+  } catch (IOException | MemoryException e) {
+e.printStackTrace();

Review comment:
   ok, done.

##
File path: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java
##
@@ -37,6 +40,13 @@
 import org.apache.log4j.Logger;
 
 public class SortParameters implements Serializable {
+  
+  private ExecutorService writeService = Executors.newFixedThreadPool(5,

Review comment:
   ok,done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module

2020-07-27 Thread GitBox


QiangCai commented on a change in pull request #3860:
URL: https://github.com/apache/carbondata/pull/3860#discussion_r460716343



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/FixedLengthDimensionColumnPage.java
##
@@ -136,15 +131,29 @@ public int fillVector(ColumnVectorInfo[] vectorInfo, int 
chunkIndex) {
   } else if (dataType == DataTypes.LONG) {
 vector.putLong(vectorOffset++, (long) valueFromSurrogate);
   } else {
-throw new IllegalArgumentException("unsupported data type: " +
-columnVectorInfo.directDictionaryGenerator.getReturnType());
+throw new IllegalArgumentException(
+"unsupported data type: " + 
columnVectorInfo.directDictionaryGenerator
+.getReturnType());

Review comment:
   reverted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module

2020-07-27 Thread GitBox


QiangCai commented on a change in pull request #3860:
URL: https://github.com/apache/carbondata/pull/3860#discussion_r460716433



##
File path: 
core/src/main/java/org/apache/carbondata/core/index/dev/expr/AndIndexExprWrapper.java
##
@@ -47,25 +47,20 @@ public AndIndexExprWrapper(IndexExprWrapper left, 
IndexExprWrapper right,
   }
 
   @Override
-  public List prune(List segments, 
List partitionsToPrune)
-  throws IOException {
-List leftPrune = left.prune(segments, partitionsToPrune);
-List rightPrune = right.prune(segments, 
partitionsToPrune);
-List andBlocklets = new ArrayList<>();
-for (ExtendedBlocklet blocklet : leftPrune) {
-  if (rightPrune.contains(blocklet)) {
-andBlocklets.add(blocklet);
-  }
-}
-return andBlocklets;
+  public List prune(List segments,
+  List partitionsToPrune) throws IOException {
+return and(left.prune(segments, partitionsToPrune), right.prune(segments, 
partitionsToPrune));
   }
 
   @Override
   public List prune(IndexInputSplit distributable,
-  List partitionsToPrune)
-  throws IOException {
-List leftPrune = left.prune(distributable, 
partitionsToPrune);
-List rightPrune = right.prune(distributable, 
partitionsToPrune);
+  List partitionsToPrune) throws IOException {
+return and(left.prune(distributable, partitionsToPrune),
+right.prune(distributable, partitionsToPrune));
+  }
+
+  private List and(List leftPrune,
+  List rightPrune) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3864:
URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664187028


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3502/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3864:
URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664186370


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1760/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [wip]remove compressor name from tupleID

2020-07-27 Thread GitBox


CarbonDataQA1 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664184402


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3501/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process

2020-07-27 Thread GitBox


ajantha-bhat commented on pull request #3863:
URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664184155


   LGTM.
   can merge once build passes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3864: [HOTFIX] Show Segment with stage returns empty

2020-07-27 Thread GitBox


marchpure opened a new pull request #3864:
URL: https://github.com/apache/carbondata/pull/3864


### Why is this PR needed?
ListStageFiles function has a bug, leading to the failure of listing stage 
files

### What changes were proposed in this PR?
   The code related to list stage files has been modified. bug solved
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3837: [wip]remove compressor name from tupleID

2020-07-27 Thread GitBox


akashrn5 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664178336


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory

2020-07-27 Thread yutao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yutao updated CARBONDATA-3926:
--
Description: 
[https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md]
 i work with this ,use spark sql create carbondata table and i can see 
 -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 
hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema

then i write flink app and run with yarn;

it work i can see carbonfile in my code defined directory ;

val dataTempPath = "hdfs://beh/user/dc_cbss/temp/"

[dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/
 Found 10 items
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 
hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 
hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 
hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 
hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 
hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 
hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 
hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278
 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486
 [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/
 Found 8 items
 -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex
 -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex
 -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex
 -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex
 -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata
 -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata
 -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata
 -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata

 

but there no stage_data directory and data not mv to stage_data when flink app 
commit;

i debug code find in CarbonWriter.java file find  this method influence it ;

protected StageInput uploadSegmentDataFiles(final String localPath, final 
String remotePath) {

if (!this.table.isHivePartitionTable()) {
 final *{color:#ff}File[] files = new File(localPath).listFiles();{color}*
 if (files == null)

{ LOGGER.error("files is null" ); return null; }

Map fileNameMapLength = new HashMap<>(files.length);
 for (File file : files) {
 fileNameMapLength.put(file.getName(), file.length());
 if (LOGGER.isDebugEnabled())

{ LOGGER.debug( "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath 
+ "] start."); }

try

{ CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), 
remotePath, 1024); }

catch (CarbonDataWriterException exception)

{ LOGGER.error(exception.getMessage(), exception); throw exception; }

if (LOGGER.isDebugEnabled())

{ LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath 
+ "] end."); }

}
 return new StageInput(remotePath, fileNameMapLength);
 } else {
 final List partitionLocationList = new 
ArrayList<>();
 final List partitions = new ArrayList<>();
 uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, 
partitions);
 if (partitionLocationList.isEmpty())

{ return null; }

else

{ return new StageInput(remotePath, 

[jira] [Created] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory

2020-07-27 Thread yutao (Jira)
yutao created CARBONDATA-3926:
-

 Summary: flink-integration i find it can't move file to stage_data 
directory 
 Key: CARBONDATA-3926
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3926
 Project: CarbonData
  Issue Type: Bug
  Components: flink-integration
Affects Versions: 2.0.0, 2.0.1
 Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink 
1.10.1,hive 1.1.0
Reporter: yutao
 Fix For: 2.1.0


[https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md]
 i work with this ,use spark sql create carbondata table and i can see 
 -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 
hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema

then i write flink app and run with yarn;

it work i can see carbonfile in my code defined directory ;

val dataTempPath = "hdfs://beh/user/dc_cbss/temp/"

[dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/
Found 10 items
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 
hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 
hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 
hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 
hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 
hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 
hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 
hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278
drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 
hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486
[dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/
Found 8 items
-rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex
-rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex
-rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex
-rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex
-rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata
-rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata
-rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata
-rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 
hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata

 

but there no stage_data directory and data not mv to stage_data when flink app 
commit;

i debug code find in CarbonWriter.java file protected StageInput 
uploadSegmentDataFiles(final String localPath, final String remotePath) {

if (!this.table.isHivePartitionTable()) {
 final *{color:#FF}File[] files = new File(localPath).listFiles();{color}*
 if (files == null) {
 LOGGER.error("files is null" );
 return null;
 }
 Map fileNameMapLength = new HashMap<>(files.length);
 for (File file : files) {
 fileNameMapLength.put(file.getName(), file.length());
 if (LOGGER.isDebugEnabled()) {
 LOGGER.debug(
 "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start.");
 }
 try {
 CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), 
remotePath, 1024);
 } catch (CarbonDataWriterException exception) {
 LOGGER.error(exception.getMessage(), exception);
 throw exception;
 }
 if (LOGGER.isDebugEnabled()) {
 LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + 
"] end.");
 }
 }
 return new StageInput(remotePath, fileNameMapLength);
} else {
 final List 

[GitHub] [carbondata] QiangCai opened a new pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process

2020-07-27 Thread GitBox


QiangCai opened a new pull request #3863:
URL: https://github.com/apache/carbondata/pull/3863


### Why is this PR needed?
PR#3805 introduces a problem that the system will add default dynamic 
parameters many times in a JVM process at concurrent query case. 
   If ConfigEntry.registerEntry method registers an exists entry again, it will 
throw exception.
   
### What changes were proposed in this PR?
   Invoking CarbonSQLConf.addDefaultParams only one time in a JVM process
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3774: [CARBONDATA-3833] Make geoID visible

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3774:
URL: https://github.com/apache/carbondata/pull/3774#discussion_r460688922



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
##
@@ -112,6 +238,23 @@ class GeoTest extends QueryTest with BeforeAndAfterAll 
with BeforeAndAfterEach {
   result)
   }
 
+  test("test insert into non-geo table select from geo table") {

Review comment:
   please add a test case of insert into geo table, where insert rows will 
not have geo data. but select *  shows geo data





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname

2020-07-27 Thread yutao (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165483#comment-17165483
 ] 

yutao commented on CARBONDATA-3925:
---

 i want resolve this bug 

> flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname
> 
>
> Key: CARBONDATA-3925
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3925
> Project: CarbonData
>  Issue Type: Improvement
>  Components: flink-integration
>Affects Versions: 2.0.0
>Reporter: yutao
>Priority: Minor
> Fix For: 2.0.1
>
>
> in CarbonWriter.java code ,you can find this;
> public abstract class *{color:red}CarbonWriter{color}* extends 
> ProxyFileWriter {
>   private static final Logger LOGGER =
>   
> LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());}
> always wo can find logfile print like ;
> 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer  
> this is puzzled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname

2020-07-27 Thread yutao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yutao updated CARBONDATA-3925:
--
Description: 
in CarbonWriter.java code ,you can find this;
public abstract class *{color:red}CarbonWriter{color}* extends 
ProxyFileWriter {
  private static final Logger LOGGER =
  
LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());}
always wo can find logfile print like ;
2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer  
this is puzzled

  was:in CarbonWriter.java code ,you can find this


> flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname
> 
>
> Key: CARBONDATA-3925
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3925
> Project: CarbonData
>  Issue Type: Improvement
>  Components: flink-integration
>Affects Versions: 2.0.0
>Reporter: yutao
>Priority: Minor
> Fix For: 2.0.1
>
>
> in CarbonWriter.java code ,you can find this;
> public abstract class *{color:red}CarbonWriter{color}* extends 
> ProxyFileWriter {
>   private static final Logger LOGGER =
>   
> LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());}
> always wo can find logfile print like ;
> 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer  
> this is puzzled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname

2020-07-27 Thread yutao (Jira)
yutao created CARBONDATA-3925:
-

 Summary: flink-integration CarbonWriter.java LOG print use 
CarbonS3Writer's classname
 Key: CARBONDATA-3925
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3925
 Project: CarbonData
  Issue Type: Improvement
  Components: flink-integration
Affects Versions: 2.0.0
Reporter: yutao
 Fix For: 2.0.1


in CarbonWriter.java code ,you can find this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process

2020-07-27 Thread David Cai (Jira)
David Cai created CARBONDATA-3924:
-

 Summary: Should add default dynamic parameters only one time in 
one JVM process
 Key: CARBONDATA-3924
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3924
 Project: CarbonData
  Issue Type: Bug
Reporter: David Cai


Because ConfigEntry.registerEntry method cann't register same entry one times, 
so it should add default dynamic parameters only one time in one JVM process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akkio-97 closed pull request #3859: [CARBONDATA-3921] SI load fails with 'unable to get filestatus error' in concurrent scenario

2020-07-27 Thread GitBox


akkio-97 closed pull request #3859:
URL: https://github.com/apache/carbondata/pull/3859


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3857: [CARBONDATA-3914] Fixed issue on reading data from carbon table through hive beeline when no data is present in table.

2020-07-27 Thread GitBox


akashrn5 commented on a change in pull request #3857:
URL: https://github.com/apache/carbondata/pull/3857#discussion_r460677188



##
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
##
@@ -2137,7 +2138,7 @@ public static String getFilePathExternalFilePath(String 
path, Configuration conf
 if (fistFilePath == null) {
   // Check if we can infer the schema from the hive metastore.
   LOGGER.error("CarbonData file is not present in the table location");
-  throw new IOException("CarbonData file is not present in the table 
location");
+  throw new FileNotFoundException("CarbonData file is not present in the 
table location");

Review comment:
   @Karan980 , the inferSchema is called from many places, basically can 
you check and  confirm from the code that, when you throw the 
FileNotFoundException, the exception is properly handled from all the callers 
or callers of caller and confirm no where the exception will be hidden due to 
this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3857: [CARBONDATA-3914] Fixed issue on reading data from carbon table through hive beeline when no data is present in table.

2020-07-27 Thread GitBox


akashrn5 commented on a change in pull request #3857:
URL: https://github.com/apache/carbondata/pull/3857#discussion_r460677188



##
File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
##
@@ -2137,7 +2138,7 @@ public static String getFilePathExternalFilePath(String 
path, Configuration conf
 if (fistFilePath == null) {
   // Check if we can infer the schema from the hive metastore.
   LOGGER.error("CarbonData file is not present in the table location");
-  throw new IOException("CarbonData file is not present in the table 
location");
+  throw new FileNotFoundException("CarbonData file is not present in the 
table location");

Review comment:
   @Karan980 , the `inferSchema ` is called from many places, basically can 
you check and  confirm from the code that, when you throw the 
`FileNotFoundException`, the exception is properly handled from all the callers 
or callers of caller and confirm no where the exception will be hidden due to 
this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-07-27 Thread GitBox


MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460675486



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
   Array(new Service("security.indexserver.protocol.acl", 
classOf[ServerInterface]))
 }
   }
+
+  def startAgingFolders(): Unit = {
+val runnable = new Runnable() {
+  def run() {
+val age = System.currentTimeMillis() - agePeriod.toLong
+CarbonUtil.agingTempFolderForIndexServer(age)
+LOGGER.info(s"Complete age temp folder 
${CarbonUtil.getIndexServerTempPath}")
+  }
+}
+val ags: ScheduledExecutorService = 
Executors.newSingleThreadScheduledExecutor
+ags.scheduleAtFixedRate(runnable, 1000, 360, TimeUnit.MICROSECONDS)

Review comment:
   the rate is 3 hours, about the delay time, i thinks it is ok, delay 1s 
or delay 5min or delay 1hour the effect is almost the same.
   The test cases are covered here. If there is too much delay, the execution 
of test cases will be affected.
   so kunal is there no need to modify the delay here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-07-27 Thread GitBox


MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460667653



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
   Array(new Service("security.indexserver.protocol.acl", 
classOf[ServerInterface]))
 }
   }
+
+  def startAgingFolders(): Unit = {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r460667652



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java
##
@@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) {
 UnsafeSortDataRows[] sortDataRows = new 
UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()];
 intermediateFileMergers = new 
UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()];
 SortParameters[] sortParameterArray = new 
SortParameters[columnRangeInfo.getNumOfRanges()];
+this.writeService = 
Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(),

Review comment:
   If you increase `carbon.number.of.cores.while.loading`, there will be 
more UnsafeSortDataRows and writing temp files can finish faster without any of 
these changes.
   
   Is it necessary to introduce another multi-thread here ?
   please tell your opinion @kevinjmh @kumarvishal09 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r460667652



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java
##
@@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) {
 UnsafeSortDataRows[] sortDataRows = new 
UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()];
 intermediateFileMergers = new 
UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()];
 SortParameters[] sortParameterArray = new 
SortParameters[columnRangeInfo.getNumOfRanges()];
+this.writeService = 
Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(),

Review comment:
   If we increase `carbon.number.of.cores.while.loading`, there will be 
more UnsafeSortDataRows and writing temp files can finish faster without any of 
these changes.
   
   Is it necessary to introduce another multi-thread here ?
   please tell your opinion @kevinjmh @kumarvishal09 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-07-27 Thread GitBox


MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r460666952



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
   Array(new Service("security.indexserver.protocol.acl", 
classOf[ServerInterface]))
 }
   }
+
+  def startAgingFolders(): Unit = {
+val runnable = new Runnable() {
+  def run() {
+val age = System.currentTimeMillis() - agePeriod.toLong
+CarbonUtil.agingTempFolderForIndexServer(age)
+LOGGER.info(s"Complete age temp folder 
${CarbonUtil.getIndexServerTempPath}")
+  }
+}
+val ags: ScheduledExecutorService = 
Executors.newSingleThreadScheduledExecutor
+ags.scheduleAtFixedRate(runnable, 1000, 360, TimeUnit.MICROSECONDS)
+LOGGER.info("index server temp folders aging thread start")

Review comment:
   under run func there already has logs.
 def run() {
   val age = System.currentTimeMillis() - agePeriod.toLong
   CarbonUtil.agingTempFolderForIndexServer(age)
   LOGGER.info(s"Complete age temp folder 
${CarbonUtil.getIndexServerTempPath}")
 }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-27 Thread GitBox


ajantha-bhat commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-664141006


   @shunlean : please handle the comments given by @Zhangshunyu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org