[jira] [Resolved] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process
[ https://issues.apache.org/jira/browse/CARBONDATA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3924. -- Fix Version/s: 2.1.0 Resolution: Fixed > Should add default dynamic parameters only one time in one JVM process > -- > > Key: CARBONDATA-3924 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3924 > Project: CarbonData > Issue Type: Bug >Reporter: David Cai >Priority: Major > Fix For: 2.1.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Because ConfigEntry.registerEntry method cann't register same entry one > times, so it should add default dynamic parameters only one time in one JVM > process -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module
CarbonDataQA1 commented on pull request #3860: URL: https://github.com/apache/carbondata/pull/3860#issuecomment-664293945 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1761/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module
CarbonDataQA1 commented on pull request #3860: URL: https://github.com/apache/carbondata/pull/3860#issuecomment-664286181 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3503/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [CARBONDATA-3927]Remove compressor name from tupleID to make it short to improve store size and performance.
CarbonDataQA1 commented on pull request #3837: URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664283269 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1759/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3778: [CARBONDATA-3916] Support array with SI
ajantha-bhat commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r460795333 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithComplexArrayType.scala ## @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.testsuite.secondaryindex + +import org.apache.spark.sql.Row +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterEach + +import org.apache.carbondata.spark.testsuite.secondaryindex.TestSecondaryIndexUtils.isFilterPushedDownToSI + +class TestSIWithComplexArrayType extends QueryTest with BeforeAndAfterEach { + + override def beforeEach(): Unit = { +sql("drop table if exists complextable") + } + + override def afterEach(): Unit = { +sql("drop index if exists index_1 on complextable") +sql("drop table if exists complextable") + } + + test("test array on secondary index") { Review comment: d) If two array_contains() present with AND in query. When it is pushed down as equal to filter in SI. It will give 0 rows as SI is flattened and it cannot find two values in one row. Need to handle that also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process
ajantha-bhat commented on pull request #3863: URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664257717 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process
CarbonDataQA1 commented on pull request #3863: URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664251374 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3500/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3927) TupleID/Position reference is long , make it short
[ https://issues.apache.org/jira/browse/CARBONDATA-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-3927: Issue Type: Improvement (was: Bug) > TupleID/Position reference is long , make it short > -- > > Key: CARBONDATA-3927 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3927 > Project: CarbonData > Issue Type: Improvement >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > > the current tuple id is long where some parts we can avoid to improve > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3929) Improve the CDC merge feature time
Akash R Nilugal created CARBONDATA-3929: --- Summary: Improve the CDC merge feature time Key: CARBONDATA-3929 URL: https://issues.apache.org/jira/browse/CARBONDATA-3929 Project: CarbonData Issue Type: Improvement Reporter: Akash R Nilugal Assignee: Akash R Nilugal Improve the CDC merge feature time -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r46076 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java ## @@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) { UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()]; intermediateFileMergers = new UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()]; SortParameters[] sortParameterArray = new SortParameters[columnRangeInfo.getNumOfRanges()]; +this.writeService = Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(), Review comment: @kevinjmh : Yes, If cores are available, adding threads horizontally can speedup not just sort, but other steps in data loading also. If cores are not available, adding threads vertically also no use as they will end up waiting for cpu. so, I felt. This PR changes not required and user can increase `carbon.number.of.cores.while.loading` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty
marchpure commented on pull request #3864: URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664245504 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process
CarbonDataQA1 commented on pull request #3863: URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664244778 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1758/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3928) Handle the Strings which length is greater than 32000 as a bad record.
Nihal kumar ojha created CARBONDATA-3928: Summary: Handle the Strings which length is greater than 32000 as a bad record. Key: CARBONDATA-3928 URL: https://issues.apache.org/jira/browse/CARBONDATA-3928 Project: CarbonData Issue Type: Task Reporter: Nihal kumar ojha Currently, when the string length exceeds 32000 then the load is failed. Suggestion: 1. Bad record can handle string length greater than 32000 and load should not be failed because only a few records string length is greater than 32000. 2. Include some more information in the log message like which record and column have the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3927) TupleID/Position reference is long , make it short
Akash R Nilugal created CARBONDATA-3927: --- Summary: TupleID/Position reference is long , make it short Key: CARBONDATA-3927 URL: https://issues.apache.org/jira/browse/CARBONDATA-3927 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal the current tuple id is long where some parts we can avoid to improve performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
CarbonDataQA1 commented on pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664239278 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3499/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r46076 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java ## @@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) { UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()]; intermediateFileMergers = new UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()]; SortParameters[] sortParameterArray = new SortParameters[columnRangeInfo.getNumOfRanges()]; +this.writeService = Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(), Review comment: @kevinjmh : Yes, If cores are available adding threads horizontally can speedup not just sort, but other steps in data loading also. If cores are not available, adding threads vertically also no use as they will end up waiting for cpu. so, I felt. This PR changes not required and user can increase `carbon.number.of.cores.while.loading` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
CarbonDataQA1 commented on pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#issuecomment-664237649 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1757/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory
[ https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165566#comment-17165566 ] yutao commented on CARBONDATA-3926: --- but i think it can be a hdfs directory > flink-integration i find it can't move file to stage_data directory > > > Key: CARBONDATA-3926 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3926 > Project: CarbonData > Issue Type: Bug > Components: flink-integration >Affects Versions: 2.0.0, 2.0.1 > Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink > 1.10.1,hive 1.1.0 >Reporter: yutao >Priority: Critical > Fix For: 2.1.0 > > > [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md] > i work with this ,use spark sql create carbondata table and i can see > -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 > hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema > then i write flink app and run with yarn; > it work i can see carbonfile in my code defined directory ; > val dataTempPath = "hdfs://beh/user/dc_cbss/temp/" > [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/ > Found 10 items > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 > hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 > hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 > hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 > hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 > hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 > hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 > hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 > hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 > hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486 > [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/ > Found 8 items > -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata > > but there no stage_data directory and data not mv to stage_data when flink > app commit; > i debug code find in CarbonWriter.java file find this method influence it ; > protected StageInput uploadSegmentDataFiles(final String localPath, final > String remotePath) { > if (!this.table.isHivePartitionTable()) { > final *{color:#ff}File[] files = new File(localPath).listFiles();{color}* > if (files == null) > { LOGGER.error("files is null" ); return null; } > Map fileNameMapLength = new HashMap<>(files.length); > for (File file : files) { > fileNameMapLength.put(file.getName(), file.length()); > if (LOGGER.isDebugEnabled()) > { LOGGER.debug( "Upload file[" +
[jira] [Closed] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory
[ https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yutao closed CARBONDATA-3926. - Resolution: Not A Bug the temp directory is a local directory,not allow hdfs directory > flink-integration i find it can't move file to stage_data directory > > > Key: CARBONDATA-3926 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3926 > Project: CarbonData > Issue Type: Bug > Components: flink-integration >Affects Versions: 2.0.0, 2.0.1 > Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink > 1.10.1,hive 1.1.0 >Reporter: yutao >Priority: Critical > Fix For: 2.1.0 > > > [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md] > i work with this ,use spark sql create carbondata table and i can see > -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 > hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema > then i write flink app and run with yarn; > it work i can see carbonfile in my code defined directory ; > val dataTempPath = "hdfs://beh/user/dc_cbss/temp/" > [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/ > Found 10 items > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 > hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 > hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 > hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 > hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 > hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 > hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 > hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 > hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278 > drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 > hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486 > [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/ > Found 8 items > -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex > -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata > -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 > hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata > > but there no stage_data directory and data not mv to stage_data when flink > app commit; > i debug code find in CarbonWriter.java file find this method influence it ; > protected StageInput uploadSegmentDataFiles(final String localPath, final > String remotePath) { > if (!this.table.isHivePartitionTable()) { > final *{color:#ff}File[] files = new File(localPath).listFiles();{color}* > if (files == null) > { LOGGER.error("files is null" ); return null; } > Map fileNameMapLength = new HashMap<>(files.length); > for (File file : files) { > fileNameMapLength.put(file.getName(), file.length()); > if (LOGGER.isDebugEnabled()) > { LOGGER.debug( "Upload file[" +
[GitHub] [carbondata] kevinjmh commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
kevinjmh commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r460750438 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java ## @@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) { UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()]; intermediateFileMergers = new UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()]; SortParameters[] sortParameterArray = new SortParameters[columnRangeInfo.getNumOfRanges()]; +this.writeService = Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(), Review comment: @ajantha-bhat Good point. So the only difference is adding threads horizontally or vertically. If each thread takes same time to process the data and writes at same time, performance may degrade caused by IO preemption. But the different may not big when number of input split is large enough. @shunlean could you please do some test to confirm ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shunlean commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
shunlean commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r460741367 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java ## @@ -200,25 +203,44 @@ public void startSorting() { * @param file file * @throws CarbonSortKeyAndGroupByException */ - private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) - throws CarbonSortKeyAndGroupByException { -DataOutputStream stream = null; -try { - // open stream - stream = FileFactory.getDataOutputStream(file.getPath(), - parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); - int actualSize = rowPage.getBuffer().getActualSize(); - // write number of entries to the file - stream.writeInt(actualSize); - for (int i = 0; i < actualSize; i++) { -rowPage.writeRow( -rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); + private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) { +writeService.submit(new WriteThread(rowPage, file)); + } + + public class WriteThread implements Runnable { +private File file; +private UnsafeCarbonRowPage rowPage; + +public WriteThread(UnsafeCarbonRowPage rowPage, File file) { + this.rowPage = rowPage; + this.file = file; + +} + +@Override +public void run() { + DataOutputStream stream = null; + try { +// open stream +stream = FileFactory.getDataOutputStream(this.file.getPath(), +parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); +int actualSize = rowPage.getBuffer().getActualSize(); +// write number of entries to the file +stream.writeInt(actualSize); +for (int i = 0; i < actualSize; i++) { + rowPage.writeRow( + rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); +} +// add sort temp filename to and arrayList. When the list size reaches 20 then +// intermediate merging of sort temp files will be triggered +unsafeInMemoryIntermediateFileMerger.addFileToMerge(file); + } catch (IOException | MemoryException e) { +e.printStackTrace(); Review comment: ok, done. ## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java ## @@ -37,6 +40,13 @@ import org.apache.log4j.Logger; public class SortParameters implements Serializable { + + private ExecutorService writeService = Executors.newFixedThreadPool(5, Review comment: ok,done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module
QiangCai commented on a change in pull request #3860: URL: https://github.com/apache/carbondata/pull/3860#discussion_r460716343 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/FixedLengthDimensionColumnPage.java ## @@ -136,15 +131,29 @@ public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex) { } else if (dataType == DataTypes.LONG) { vector.putLong(vectorOffset++, (long) valueFromSurrogate); } else { -throw new IllegalArgumentException("unsupported data type: " + -columnVectorInfo.directDictionaryGenerator.getReturnType()); +throw new IllegalArgumentException( +"unsupported data type: " + columnVectorInfo.directDictionaryGenerator +.getReturnType()); Review comment: reverted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3860: [CARBONDATA-3889] Cleanup duplicated code in carbondata-core module
QiangCai commented on a change in pull request #3860: URL: https://github.com/apache/carbondata/pull/3860#discussion_r460716433 ## File path: core/src/main/java/org/apache/carbondata/core/index/dev/expr/AndIndexExprWrapper.java ## @@ -47,25 +47,20 @@ public AndIndexExprWrapper(IndexExprWrapper left, IndexExprWrapper right, } @Override - public List prune(List segments, List partitionsToPrune) - throws IOException { -List leftPrune = left.prune(segments, partitionsToPrune); -List rightPrune = right.prune(segments, partitionsToPrune); -List andBlocklets = new ArrayList<>(); -for (ExtendedBlocklet blocklet : leftPrune) { - if (rightPrune.contains(blocklet)) { -andBlocklets.add(blocklet); - } -} -return andBlocklets; + public List prune(List segments, + List partitionsToPrune) throws IOException { +return and(left.prune(segments, partitionsToPrune), right.prune(segments, partitionsToPrune)); } @Override public List prune(IndexInputSplit distributable, - List partitionsToPrune) - throws IOException { -List leftPrune = left.prune(distributable, partitionsToPrune); -List rightPrune = right.prune(distributable, partitionsToPrune); + List partitionsToPrune) throws IOException { +return and(left.prune(distributable, partitionsToPrune), +right.prune(distributable, partitionsToPrune)); + } + + private List and(List leftPrune, + List rightPrune) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty
CarbonDataQA1 commented on pull request #3864: URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664187028 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3502/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3864: [HOTFIX] Show Segment with stage returns empty
CarbonDataQA1 commented on pull request #3864: URL: https://github.com/apache/carbondata/pull/3864#issuecomment-664186370 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1760/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [wip]remove compressor name from tupleID
CarbonDataQA1 commented on pull request #3837: URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664184402 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3501/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process
ajantha-bhat commented on pull request #3863: URL: https://github.com/apache/carbondata/pull/3863#issuecomment-664184155 LGTM. can merge once build passes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #3864: [HOTFIX] Show Segment with stage returns empty
marchpure opened a new pull request #3864: URL: https://github.com/apache/carbondata/pull/3864 ### Why is this PR needed? ListStageFiles function has a bug, leading to the failure of listing stage files ### What changes were proposed in this PR? The code related to list stage files has been modified. bug solved ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3837: [wip]remove compressor name from tupleID
akashrn5 commented on pull request #3837: URL: https://github.com/apache/carbondata/pull/3837#issuecomment-664178336 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory
[ https://issues.apache.org/jira/browse/CARBONDATA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yutao updated CARBONDATA-3926: -- Description: [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md] i work with this ,use spark sql create carbondata table and i can see -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema then i write flink app and run with yarn; it work i can see carbonfile in my code defined directory ; val dataTempPath = "hdfs://beh/user/dc_cbss/temp/" [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/ Found 10 items drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486 [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/ Found 8 items -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata but there no stage_data directory and data not mv to stage_data when flink app commit; i debug code find in CarbonWriter.java file find this method influence it ; protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) { if (!this.table.isHivePartitionTable()) { final *{color:#ff}File[] files = new File(localPath).listFiles();{color}* if (files == null) { LOGGER.error("files is null" ); return null; } Map fileNameMapLength = new HashMap<>(files.length); for (File file : files) { fileNameMapLength.put(file.getName(), file.length()); if (LOGGER.isDebugEnabled()) { LOGGER.debug( "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start."); } try { CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024); } catch (CarbonDataWriterException exception) { LOGGER.error(exception.getMessage(), exception); throw exception; } if (LOGGER.isDebugEnabled()) { LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end."); } } return new StageInput(remotePath, fileNameMapLength); } else { final List partitionLocationList = new ArrayList<>(); final List partitions = new ArrayList<>(); uploadSegmentDataFiles(new File(localPath), remotePath, partitionLocationList, partitions); if (partitionLocationList.isEmpty()) { return null; } else { return new StageInput(remotePath,
[jira] [Created] (CARBONDATA-3926) flink-integration i find it can't move file to stage_data directory
yutao created CARBONDATA-3926: - Summary: flink-integration i find it can't move file to stage_data directory Key: CARBONDATA-3926 URL: https://issues.apache.org/jira/browse/CARBONDATA-3926 Project: CarbonData Issue Type: Bug Components: flink-integration Affects Versions: 2.0.0, 2.0.1 Environment: my hadoop is cdh-5.16.1 and spark 2.3.3, flink 1.10.1,hive 1.1.0 Reporter: yutao Fix For: 2.1.0 [https://github.com/apache/carbondata/blob/master/docs/flink-integration-guide.md] i work with this ,use spark sql create carbondata table and i can see -rw-r--r-- 3 hadoop dc_cbss 2650 2020-07-25 21:06 hdfs://beh/user/dc_cbss/warehouse/testyu.db/userpolicy/Metadata/schema then i write flink app and run with yarn; it work i can see carbonfile in my code defined directory ; val dataTempPath = "hdfs://beh/user/dc_cbss/temp/" [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/ Found 10 items drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:35 hdfs://beh/user/dc_cbss/temp/359a873ec9624623af9beae18b630fde drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:44 hdfs://beh/user/dc_cbss/temp/372f6065515e41a5b1d5e01af0a78d61 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 hdfs://beh/user/dc_cbss/temp/3735b94780484f96b211ff6d6974ce3a drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:38 hdfs://beh/user/dc_cbss/temp/8411793f4c5547dc930aacaeea3177cd drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:29 hdfs://beh/user/dc_cbss/temp/915ff23f0d9e4c2dab699d1dcc5a8b4e drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:32 hdfs://beh/user/dc_cbss/temp/bea0bef07d5f47cd92541c69b16aa64e drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:26 hdfs://beh/user/dc_cbss/temp/c42c760144da4f9d83104af270ed46c1 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:41 hdfs://beh/user/dc_cbss/temp/d8af69e47a5844a3a8ed7090ea13a278 drwxr-xr-x - dc_cbss dc_cbss 0 2020-07-27 14:50 hdfs://beh/user/dc_cbss/temp/db6dceb913444c92a3453903fb50f486 [dc_cbss@hive_client_004 yutao]$ hdfs dfs -ls hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/ Found 8 items -rw-r--r-- 3 dc_cbss dc_cbss 3100 2020-07-27 14:45 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3104 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 3110 2020-07-27 14:46 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.carbonindex -rw-r--r-- 3 dc_cbss dc_cbss 54526 2020-07-27 14:45 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-24b93d2ffbc14472b3c0e3d2cd948632_batchno0-0-null-1595831979508.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 54710 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-2da284a3beed4c15a3b60c7849d2da92_batchno0-0-null-1595832075416.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 38684 2020-07-27 14:47 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-70b01854c2d446889b91d4bc9203587c_batchno0-0-null-1595832123015.snappy.carbondata -rw-r--r-- 3 dc_cbss dc_cbss 55229 2020-07-27 14:46 hdfs://beh/user/dc_cbss/temp/33976d2f23344768b91c6ba3eadd22c8/part-0-aae80851ef534c9ca6f95669d56ec636_batchno0-0-null-1595832028966.snappy.carbondata but there no stage_data directory and data not mv to stage_data when flink app commit; i debug code find in CarbonWriter.java file protected StageInput uploadSegmentDataFiles(final String localPath, final String remotePath) { if (!this.table.isHivePartitionTable()) { final *{color:#FF}File[] files = new File(localPath).listFiles();{color}* if (files == null) { LOGGER.error("files is null" ); return null; } Map fileNameMapLength = new HashMap<>(files.length); for (File file : files) { fileNameMapLength.put(file.getName(), file.length()); if (LOGGER.isDebugEnabled()) { LOGGER.debug( "Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] start."); } try { CarbonUtil.copyCarbonDataFileToCarbonStorePath(file.getAbsolutePath(), remotePath, 1024); } catch (CarbonDataWriterException exception) { LOGGER.error(exception.getMessage(), exception); throw exception; } if (LOGGER.isDebugEnabled()) { LOGGER.debug("Upload file[" + file.getAbsolutePath() + "] to [" + remotePath + "] end."); } } return new StageInput(remotePath, fileNameMapLength); } else { final List
[GitHub] [carbondata] QiangCai opened a new pull request #3863: [CARBONDATA-3924] Add default dynamic parameters only one time in a JVM process
QiangCai opened a new pull request #3863: URL: https://github.com/apache/carbondata/pull/3863 ### Why is this PR needed? PR#3805 introduces a problem that the system will add default dynamic parameters many times in a JVM process at concurrent query case. If ConfigEntry.registerEntry method registers an exists entry again, it will throw exception. ### What changes were proposed in this PR? Invoking CarbonSQLConf.addDefaultParams only one time in a JVM process ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3774: [CARBONDATA-3833] Make geoID visible
ajantha-bhat commented on a change in pull request #3774: URL: https://github.com/apache/carbondata/pull/3774#discussion_r460688922 ## File path: integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala ## @@ -112,6 +238,23 @@ class GeoTest extends QueryTest with BeforeAndAfterAll with BeforeAndAfterEach { result) } + test("test insert into non-geo table select from geo table") { Review comment: please add a test case of insert into geo table, where insert rows will not have geo data. but select * shows geo data This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname
[ https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165483#comment-17165483 ] yutao commented on CARBONDATA-3925: --- i want resolve this bug > flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname > > > Key: CARBONDATA-3925 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3925 > Project: CarbonData > Issue Type: Improvement > Components: flink-integration >Affects Versions: 2.0.0 >Reporter: yutao >Priority: Minor > Fix For: 2.0.1 > > > in CarbonWriter.java code ,you can find this; > public abstract class *{color:red}CarbonWriter{color}* extends > ProxyFileWriter { > private static final Logger LOGGER = > > LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());} > always wo can find logfile print like ; > 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer > this is puzzled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname
[ https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yutao updated CARBONDATA-3925: -- Description: in CarbonWriter.java code ,you can find this; public abstract class *{color:red}CarbonWriter{color}* extends ProxyFileWriter { private static final Logger LOGGER = LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());} always wo can find logfile print like ; 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer this is puzzled was:in CarbonWriter.java code ,you can find this > flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname > > > Key: CARBONDATA-3925 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3925 > Project: CarbonData > Issue Type: Improvement > Components: flink-integration >Affects Versions: 2.0.0 >Reporter: yutao >Priority: Minor > Fix For: 2.0.1 > > > in CarbonWriter.java code ,you can find this; > public abstract class *{color:red}CarbonWriter{color}* extends > ProxyFileWriter { > private static final Logger LOGGER = > > LogServiceFactory.getLogService({color:red}CarbonS3Writer{color}.class.getName());} > always wo can find logfile print like ; > 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer > this is puzzled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3925) flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname
yutao created CARBONDATA-3925: - Summary: flink-integration CarbonWriter.java LOG print use CarbonS3Writer's classname Key: CARBONDATA-3925 URL: https://issues.apache.org/jira/browse/CARBONDATA-3925 Project: CarbonData Issue Type: Improvement Components: flink-integration Affects Versions: 2.0.0 Reporter: yutao Fix For: 2.0.1 in CarbonWriter.java code ,you can find this -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process
David Cai created CARBONDATA-3924: - Summary: Should add default dynamic parameters only one time in one JVM process Key: CARBONDATA-3924 URL: https://issues.apache.org/jira/browse/CARBONDATA-3924 Project: CarbonData Issue Type: Bug Reporter: David Cai Because ConfigEntry.registerEntry method cann't register same entry one times, so it should add default dynamic parameters only one time in one JVM process -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akkio-97 closed pull request #3859: [CARBONDATA-3921] SI load fails with 'unable to get filestatus error' in concurrent scenario
akkio-97 closed pull request #3859: URL: https://github.com/apache/carbondata/pull/3859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3857: [CARBONDATA-3914] Fixed issue on reading data from carbon table through hive beeline when no data is present in table.
akashrn5 commented on a change in pull request #3857: URL: https://github.com/apache/carbondata/pull/3857#discussion_r460677188 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -2137,7 +2138,7 @@ public static String getFilePathExternalFilePath(String path, Configuration conf if (fistFilePath == null) { // Check if we can infer the schema from the hive metastore. LOGGER.error("CarbonData file is not present in the table location"); - throw new IOException("CarbonData file is not present in the table location"); + throw new FileNotFoundException("CarbonData file is not present in the table location"); Review comment: @Karan980 , the inferSchema is called from many places, basically can you check and confirm from the code that, when you throw the FileNotFoundException, the exception is properly handled from all the callers or callers of caller and confirm no where the exception will be hidden due to this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3857: [CARBONDATA-3914] Fixed issue on reading data from carbon table through hive beeline when no data is present in table.
akashrn5 commented on a change in pull request #3857: URL: https://github.com/apache/carbondata/pull/3857#discussion_r460677188 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ## @@ -2137,7 +2138,7 @@ public static String getFilePathExternalFilePath(String path, Configuration conf if (fistFilePath == null) { // Check if we can infer the schema from the hive metastore. LOGGER.error("CarbonData file is not present in the table location"); - throw new IOException("CarbonData file is not present in the table location"); + throw new FileNotFoundException("CarbonData file is not present in the table location"); Review comment: @Karan980 , the `inferSchema ` is called from many places, basically can you check and confirm from the code that, when you throw the `FileNotFoundException`, the exception is properly handled from all the callers or callers of caller and confirm no where the exception will be hidden due to this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
MarvinLitt commented on a change in pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#discussion_r460675486 ## File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala ## @@ -316,4 +324,17 @@ object IndexServer extends ServerInterface { Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface])) } } + + def startAgingFolders(): Unit = { +val runnable = new Runnable() { + def run() { +val age = System.currentTimeMillis() - agePeriod.toLong +CarbonUtil.agingTempFolderForIndexServer(age) +LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}") + } +} +val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor +ags.scheduleAtFixedRate(runnable, 1000, 360, TimeUnit.MICROSECONDS) Review comment: the rate is 3 hours, about the delay time, i thinks it is ok, delay 1s or delay 5min or delay 1hour the effect is almost the same. The test cases are covered here. If there is too much delay, the execution of test cases will be affected. so kunal is there no need to modify the delay here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
MarvinLitt commented on a change in pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#discussion_r460667653 ## File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala ## @@ -316,4 +324,17 @@ object IndexServer extends ServerInterface { Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface])) } } + + def startAgingFolders(): Unit = { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r460667652 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java ## @@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) { UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()]; intermediateFileMergers = new UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()]; SortParameters[] sortParameterArray = new SortParameters[columnRangeInfo.getNumOfRanges()]; +this.writeService = Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(), Review comment: If you increase `carbon.number.of.cores.while.loading`, there will be more UnsafeSortDataRows and writing temp files can finish faster without any of these changes. Is it necessary to introduce another multi-thread here ? please tell your opinion @kevinjmh @kumarvishal09 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r460667652 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeParallelReadMergeSorterWithColumnRangeImpl.java ## @@ -99,6 +101,8 @@ public void initialize(SortParameters sortParameters) { UnsafeSortDataRows[] sortDataRows = new UnsafeSortDataRows[columnRangeInfo.getNumOfRanges()]; intermediateFileMergers = new UnsafeIntermediateMerger[columnRangeInfo.getNumOfRanges()]; SortParameters[] sortParameterArray = new SortParameters[columnRangeInfo.getNumOfRanges()]; +this.writeService = Executors.newFixedThreadPool(originSortParameters.getNumberOfCores(), Review comment: If we increase `carbon.number.of.cores.while.loading`, there will be more UnsafeSortDataRows and writing temp files can finish faster without any of these changes. Is it necessary to introduce another multi-thread here ? please tell your opinion @kevinjmh @kumarvishal09 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data
MarvinLitt commented on a change in pull request #3855: URL: https://github.com/apache/carbondata/pull/3855#discussion_r460666952 ## File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala ## @@ -316,4 +324,17 @@ object IndexServer extends ServerInterface { Array(new Service("security.indexserver.protocol.acl", classOf[ServerInterface])) } } + + def startAgingFolders(): Unit = { +val runnable = new Runnable() { + def run() { +val age = System.currentTimeMillis() - agePeriod.toLong +CarbonUtil.agingTempFolderForIndexServer(age) +LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}") + } +} +val ags: ScheduledExecutorService = Executors.newSingleThreadScheduledExecutor +ags.scheduleAtFixedRate(runnable, 1000, 360, TimeUnit.MICROSECONDS) +LOGGER.info("index server temp folders aging thread start") Review comment: under run func there already has logs. def run() { val age = System.currentTimeMillis() - agePeriod.toLong CarbonUtil.agingTempFolderForIndexServer(age) LOGGER.info(s"Complete age temp folder ${CarbonUtil.getIndexServerTempPath}") } This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-664141006 @shunlean : please handle the comments given by @Zhangshunyu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org