[jira] [Created] (CARBONDATA-3911) NullPointerException is thrown when clean files is executed after two updates
Akash R Nilugal created CARBONDATA-3911: --- Summary: NullPointerException is thrown when clean files is executed after two updates Key: CARBONDATA-3911 URL: https://issues.apache.org/jira/browse/CARBONDATA-3911 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal * create table * load data * load one more data * update1 * update2 * clean files fails with NullPointer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 commented on pull request #3838: [CARBONDATA-3910]Fix load failure in cluster when csv present in local file system in case of global sort
akashrn5 commented on pull request #3838: URL: https://github.com/apache/carbondata/pull/3838#issuecomment-659863917 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3910) load fails when csv file present in local and loading to cluster
Akash R Nilugal created CARBONDATA-3910: --- Summary: load fails when csv file present in local and loading to cluster Key: CARBONDATA-3910 URL: https://issues.apache.org/jira/browse/CARBONDATA-3910 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal load fails when csv file present in local and loading to cluster -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] QiangCai commented on a change in pull request #3842: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
QiangCai commented on a change in pull request #3842: URL: https://github.com/apache/carbondata/pull/3842#discussion_r456206506 ## File path: integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala ## @@ -157,21 +157,21 @@ object CarbonMergeFilesRDD { if (carbonTable.isHivePartitionTable && !StringUtils.isEmpty(tempFolderPath)) { // remove all tmp folder of index files val startDelete = System.currentTimeMillis() - val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10) - val executorService = Executors.newFixedThreadPool(numThreads) - val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo - partitionInfo -.asScala -.map { partitionPath => - executorService.submit(new Runnable { -override def run(): Unit = { - ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo) - FileFactory.deleteAllCarbonFilesOfDir( -FileFactory.getCarbonFile(partitionPath + "/" + tempFolderPath)) -} - }) + val allTmpDirs = partitionInfo +.asScala.map { partitionPath => + partitionPath + CarbonCommonConstants.FILE_SEPARATOR + tempFolderPath } -.map(_.get()) + val allTmpFiles = allTmpDirs.map { partitionDir => + FileFactory.getCarbonFile(partitionDir).listFiles() Review comment: if loading create too many partitions, this listFiles will take a long time also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3842: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
QiangCai commented on a change in pull request #3842: URL: https://github.com/apache/carbondata/pull/3842#discussion_r456206506 ## File path: integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala ## @@ -157,21 +157,21 @@ object CarbonMergeFilesRDD { if (carbonTable.isHivePartitionTable && !StringUtils.isEmpty(tempFolderPath)) { // remove all tmp folder of index files val startDelete = System.currentTimeMillis() - val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10) - val executorService = Executors.newFixedThreadPool(numThreads) - val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo - partitionInfo -.asScala -.map { partitionPath => - executorService.submit(new Runnable { -override def run(): Unit = { - ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo) - FileFactory.deleteAllCarbonFilesOfDir( -FileFactory.getCarbonFile(partitionPath + "/" + tempFolderPath)) -} - }) + val allTmpDirs = partitionInfo +.asScala.map { partitionPath => + partitionPath + CarbonCommonConstants.FILE_SEPARATOR + tempFolderPath } -.map(_.get()) + val allTmpFiles = allTmpDirs.map { partitionDir => + FileFactory.getCarbonFile(partitionDir).listFiles() Review comment: if loading create too many partitions, this list file also take a long time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI
QiangCai commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659830702 please describe the PR and fix the failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
QiangCai commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659829122 @ajantha-bhat 1. remove CarbonData jars from your local maven repo at first. 2. build it with -o, you will find dependency error( can not find the dependency: carbondata-spark_2.3 and carbondata-spark_2.4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3810: [CARBONDATA-3900] [CARBONDATA-3882] [CARBONDATA-3881] Fix multiple concurrent issues in table status lock and segment lock f
QiangCai commented on a change in pull request #3810: URL: https://github.com/apache/carbondata/pull/3810#discussion_r456203328 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ## @@ -440,14 +448,22 @@ object SecondaryIndexUtil { val loadFolderDetailsArray = SegmentStatusManager.readLoadMetadata(indexTable Review comment: reading should be inside of locking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
Zhangshunyu commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193999 ## File path: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java ## @@ -37,6 +40,13 @@ import org.apache.log4j.Logger; public class SortParameters implements Serializable { + + private ExecutorService writeService = Executors.newFixedThreadPool(5, Review comment: Suggest to make it configurable when set core pool size for threadpool This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
Zhangshunyu commented on a change in pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193818 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java ## @@ -200,25 +203,44 @@ public void startSorting() { * @param file file * @throws CarbonSortKeyAndGroupByException */ - private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) - throws CarbonSortKeyAndGroupByException { -DataOutputStream stream = null; -try { - // open stream - stream = FileFactory.getDataOutputStream(file.getPath(), - parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); - int actualSize = rowPage.getBuffer().getActualSize(); - // write number of entries to the file - stream.writeInt(actualSize); - for (int i = 0; i < actualSize; i++) { -rowPage.writeRow( -rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); + private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) { +writeService.submit(new WriteThread(rowPage, file)); + } + + public class WriteThread implements Runnable { +private File file; +private UnsafeCarbonRowPage rowPage; + +public WriteThread(UnsafeCarbonRowPage rowPage, File file) { + this.rowPage = rowPage; + this.file = file; + +} + +@Override +public void run() { + DataOutputStream stream = null; + try { +// open stream +stream = FileFactory.getDataOutputStream(this.file.getPath(), +parameters.getFileWriteBufferSize(), parameters.getSortTempCompressorName()); +int actualSize = rowPage.getBuffer().getActualSize(); +// write number of entries to the file +stream.writeInt(actualSize); +for (int i = 0; i < actualSize; i++) { + rowPage.writeRow( + rowPage.getBuffer().get(i) + rowPage.getDataBlock().getBaseOffset(), stream); +} +// add sort temp filename to and arrayList. When the list size reaches 20 then +// intermediate merging of sort temp files will be triggered +unsafeInMemoryIntermediateFileMerger.addFileToMerge(file); + } catch (IOException | MemoryException e) { +e.printStackTrace(); Review comment: use log4j instead of printStackStrace This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
Zhangshunyu commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659810429 please check the build failure info This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] tianlileer closed pull request #710: [CARBONDATA-833]load data from dataframe,generater data row may be error when delimiter…
tianlileer closed pull request #710: URL: https://github.com/apache/carbondata/pull/710 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3785: [CARBONDATA-3843] Fix merge index issue in streaming table
QiangCai commented on a change in pull request #3785: URL: https://github.com/apache/carbondata/pull/3785#discussion_r456179554 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala ## @@ -104,73 +104,80 @@ class MergeIndexEventListener extends OperationEventListener with Logging { case alterTableMergeIndexEvent: AlterTableMergeIndexEvent => val carbonMainTable = alterTableMergeIndexEvent.carbonTable val sparkSession = alterTableMergeIndexEvent.sparkSession -if (!carbonMainTable.isStreamingSink) { - LOGGER.info(s"Merge Index request received for table " + - s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") - val lock = CarbonLockFactory.getCarbonLockObj( -carbonMainTable.getAbsoluteTableIdentifier, -LockUsage.COMPACTION_LOCK) +LOGGER.info(s"Merge Index request received for table " + +s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") +val lock = CarbonLockFactory.getCarbonLockObj( + carbonMainTable.getAbsoluteTableIdentifier, + LockUsage.COMPACTION_LOCK) - try { -if (lock.lockWithRetries()) { - LOGGER.info("Acquired the compaction lock for table" + - s" ${ carbonMainTable.getDatabaseName }.${ -carbonMainTable - .getTableName - }") - val segmentsToMerge = -if (alterTableMergeIndexEvent.alterTableModel.customSegmentIds.isEmpty) { - val validSegments = - CarbonDataMergerUtil.getValidSegmentList(carbonMainTable).asScala - val validSegmentIds: mutable.Buffer[String] = mutable.Buffer[String]() - validSegments.foreach { segment => +try { + if (lock.lockWithRetries()) { +LOGGER.info("Acquired the compaction lock for table" + +s" ${ carbonMainTable.getDatabaseName }.${ + carbonMainTable +.getTableName +}") Review comment: combine these lines to one line ## File path: integration/spark/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala ## @@ -104,73 +104,80 @@ class MergeIndexEventListener extends OperationEventListener with Logging { case alterTableMergeIndexEvent: AlterTableMergeIndexEvent => val carbonMainTable = alterTableMergeIndexEvent.carbonTable val sparkSession = alterTableMergeIndexEvent.sparkSession -if (!carbonMainTable.isStreamingSink) { - LOGGER.info(s"Merge Index request received for table " + - s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") - val lock = CarbonLockFactory.getCarbonLockObj( -carbonMainTable.getAbsoluteTableIdentifier, -LockUsage.COMPACTION_LOCK) +LOGGER.info(s"Merge Index request received for table " + +s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") +val lock = CarbonLockFactory.getCarbonLockObj( + carbonMainTable.getAbsoluteTableIdentifier, + LockUsage.COMPACTION_LOCK) - try { -if (lock.lockWithRetries()) { - LOGGER.info("Acquired the compaction lock for table" + - s" ${ carbonMainTable.getDatabaseName }.${ -carbonMainTable - .getTableName - }") - val segmentsToMerge = -if (alterTableMergeIndexEvent.alterTableModel.customSegmentIds.isEmpty) { - val validSegments = - CarbonDataMergerUtil.getValidSegmentList(carbonMainTable).asScala - val validSegmentIds: mutable.Buffer[String] = mutable.Buffer[String]() - validSegments.foreach { segment => +try { + if (lock.lockWithRetries()) { +LOGGER.info("Acquired the compaction lock for table" + +s" ${ carbonMainTable.getDatabaseName }.${ + carbonMainTable +.getTableName +}") +val loadFolderDetailsArray = SegmentStatusManager + .readLoadMetadata(carbonMainTable.getMetadataPath) +val segmentFileNameMap: java.util.Map[String, String] = new util.HashMap[String, + String]() +var streamingSegment: Set[String] = Set[String]() +loadFolderDetailsArray.foreach(loadMetadataDetails => { + if (loadMetadataDetails.getFileFormat.equals(FileFormat.ROW_V1)) { +streamingSegment +=
[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
QiangCai commented on a change in pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#discussion_r456176045 ## File path: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java ## @@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, } } BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers); -for (int i = 0; i < pageNumbers; i++) { - BitSet set = new BitSet(numberOfRows[i]); - RowIntf row = new RowImpl(); - BitSet prvBitset = null; - // if bitset pipe line is enabled then use rowid from previous bitset - // otherwise use older flow - if (!useBitsetPipeLine || - null == rawBlockletColumnChunks.getBitSetGroup() || - null == bitSetGroup.getBitSet(i) || - rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) { +if (isDimensionPresentInCurrentBlock.length == 1 && isDimensionPresentInCurrentBlock[0] Review comment: it will be hard to read the code after we add more if condition This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
QiangCai commented on a change in pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#discussion_r456175101 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ## @@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { Some(CarbonContainsWith(c)) case c@Literal(v, t) if (v == null) => Some(FalseExpr()) - case others => None + case c@ArrayContains(a: Attribute, Literal(v, t)) => +a.dataType match { + case arrayType: ArrayType => +arrayType.elementType match { Review comment: how about extract the match code block to a method: isPrimitiveDataType and move it into a util class? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #3849: [WIP] table level timestampformat
Zhangshunyu commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659774957 Greate! This is a useful feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
QiangCai commented on a change in pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#discussion_r456167613 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.integration.spark.testsuite.complexType + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable + +import org.apache.spark.sql.Row +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll { + + override protected def afterAll(): Unit = { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, +CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT) +sql("DROP TABLE IF EXISTS compactComplex") + } + + test("test array contains pushdown for array of string") { +sql("drop table if exists complex1") +sql("create table complex1 (arr array) stored as carbondata") +sql("insert into complex1 select array('as') union all " + +"select array('sd','df','gh') union all " + +"select array('rt','ew','rtyu','jk',null) union all " + +"select array('ghsf','dbv','','ty') union all " + +"select array('hjsd','fggb','nhj','sd','asd')") + +checkExistence(sql(" explain select * from complex1 where array_contains(arr,'sd')"), + true, + "PushedFilters: [*EqualTo(arr,sd)]") + +checkExistence(sql(" explain select count(*) from complex1 where array_contains(arr,'sd')"), + true, + "PushedFilters: [*EqualTo(arr,sd)]") + +checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"), Review comment: can you add a test case that likes the below query? select * from complex1 where arr[0] = 'sd' can we push down this filter too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL
CarbonDataQA1 commented on pull request #3851: URL: https://github.com/apache/carbondata/pull/3851#issuecomment-659592635 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3409/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL
CarbonDataQA1 commented on pull request #3851: URL: https://github.com/apache/carbondata/pull/3851#issuecomment-659591221 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1667/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL
akashrn5 commented on a change in pull request #3851: URL: https://github.com/apache/carbondata/pull/3851#discussion_r455977271 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -234,9 +234,25 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL) } + test("test decimal value as null with global sort load") { Review comment: @kunal642 had already fixed one issue regarding null value for string, now we got for decimal. He has added test case with string, int, double, int, bigint having null values in the test case, now decimal is added, please better to add a test case for all complex types also once. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutio
akashrn5 commented on a change in pull request #3850: URL: https://github.com/apache/carbondata/pull/3850#discussion_r455974523 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala ## @@ -228,24 +228,17 @@ case class CarbonAddLoadCommand( model.setTableName(carbonTable.getTableName) val operationContext = new OperationContext operationContext.setProperty("isLoadOrCompaction", false) -val loadTablePreExecutionEvent: LoadTablePreExecutionEvent = - new LoadTablePreExecutionEvent( -carbonTable.getCarbonTableIdentifier, -model) -operationContext.setProperty("isOverwrite", false) -OperationListenerBus.getInstance.fireEvent(loadTablePreExecutionEvent, operationContext) -// Add pre event listener for index indexSchema -val tableIndexes = IndexStoreManager.getInstance().getAllCGAndFGIndexes(carbonTable) -val indexOperationContext = new OperationContext() -if (tableIndexes.size() > 0) { - val indexNames: mutable.Buffer[String] = -tableIndexes.asScala.map(index => index.getIndexSchema.getIndexName) - val buildIndexPreExecutionEvent: BuildIndexPreExecutionEvent = -BuildIndexPreExecutionEvent( - sparkSession, carbonTable.getAbsoluteTableIdentifier, indexNames) - OperationListenerBus.getInstance().fireEvent(buildIndexPreExecutionEvent, -indexOperationContext) -} +val (tableIndexes, indexOperationContext) = CommonLoadUtils.firePreLoadEvents( Review comment: @VenuReddy2103 can you confirm if at all please the same function is being used, if not change there also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat
CarbonDataQA1 commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659554650 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1666/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat
CarbonDataQA1 commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659550711 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3408/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3909) Insert into select fails after insert decimal value as null and set sort scope to global sort
Chetan Bhat created CARBONDATA-3909: --- Summary: Insert into select fails after insert decimal value as null and set sort scope to global sort Key: CARBONDATA-3909 URL: https://issues.apache.org/jira/browse/CARBONDATA-3909 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 2.0.1 Environment: Spark 2.3.2, 2.4.5 Reporter: Chetan Bhat Steps - insert decimal value as null and set sort scope to global sort and do insert into select. Issue : - Insert into select fails. Expected : - Insert into select should be success. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Indhumathi27 opened a new pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL
Indhumathi27 opened a new pull request #3851: URL: https://github.com/apache/carbondata/pull/3851 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #3849: [WIP] table level timestampformat
ShreelekhyaG commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659467768 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat
CarbonDataQA1 commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659447560 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1665/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent
CarbonDataQA1 commented on pull request #3850: URL: https://github.com/apache/carbondata/pull/3850#issuecomment-659422935 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat
CarbonDataQA1 commented on pull request #3849: URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659392712 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3407/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.
Prasanna Ravichandran created CARBONDATA-3908: - Summary: When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values. Key: CARBONDATA-3908 URL: https://issues.apache.org/jira/browse/CARBONDATA-3908 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Environment: FI cluster and opensource cluster. Reporter: Prasanna Ravichandran When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values. If we do count(*) on the added segment, then it is always showing as 0. Test queries: drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); --hdfs dfs -mkdir /uniqdata-carbon-segment; --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* /uniqdata-carbon-segment/ Alter table uniqdata add segment options ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon'); select count(*) from uniqdata;--4000 expected as one load of 2000 records happened and same segment is added again; set carbon.input.segments.default.uniqdata=1; select count(*) from uniqdata;--2000 expected - it should just show the records count of added segments; CONSOLE: /> set carbon.input.segments.default.uniqdata=1; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 1 | +-++ 1 row selected (0.192 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1734 +---+ | count(1) | +---+ | 2000 | +---+ 1 row selected (4.036 seconds) /> set carbon.input.segments.default.uniqdata=2; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 2 | +-++ 1 row selected (0.088 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1745 +---+ | count(1) | +---+ | 2000 | +---+ 1 row selected (6.056 seconds) /> set carbon.input.segments.default.uniqdata=3; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 3 | +-++ 1 row selected (0.161 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1753 +---+ | count(1) | +---+ | 0 | +---+ 1 row selected (4.875 seconds) /> show segments for table uniqdata; +-+--+--+--+++-+--+ | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | +-+--+--+--+++-+--+ | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | columnar_v3 | | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | columnar_v3 | | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc | | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | parquet | | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | columnar_v3 | +-+--+--+--+++-+--+ Expected result: Records added by adding carbon segment should be considered. Actual result: Records added by adding carbon segment is not considered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp
CarbonDataQA1 commented on pull request #3848: URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659380516 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3404/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp
CarbonDataQA1 commented on pull request #3848: URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659376770 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1662/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3907) Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alt
[ https://issues.apache.org/jira/browse/CARBONDATA-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venugopal Reddy K updated CARBONDATA-3907: -- Description: *[Issue]* Currently we have 2 different ways of firing LoadTablePreExecutionEvent and LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow as well. *[Suggestion]* Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow. was: *[Issue]* Currently we have 2 different ways of firing LoadTablePreExecutionEvent and LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow as well. So that we can have single flow to fire these events *[Suggestion]* Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow. > Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils > to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent > respectively in alter table add segment flow > -- > > Key: CARBONDATA-3907 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3907 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Venugopal Reddy K >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *[Issue]* > Currently we have 2 different ways of firing LoadTablePreExecutionEvent and > LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and > firePostLoadEvents methods from CommonLoadUtils to trigger > LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in > alter table add segment flow as well. > *[Suggestion]* > Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils > to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent > respectively in alter table add segment flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 opened a new pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent
VenuReddy2103 opened a new pull request #3850: URL: https://github.com/apache/carbondata/pull/3850 ### Why is this PR needed? Currently we have 2 different ways of firing LoadTablePreExecutionEvent and LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow as well. So that we can have single flow to fire these events ### What changes were proposed in this PR? Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3907) Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alt
Venugopal Reddy K created CARBONDATA-3907: - Summary: Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow Key: CARBONDATA-3907 URL: https://issues.apache.org/jira/browse/CARBONDATA-3907 Project: CarbonData Issue Type: Improvement Components: spark-integration Affects Versions: 2.0.0 Reporter: Venugopal Reddy K Fix For: 2.1.0 *[Issue]* Currently we have 2 different ways of firing LoadTablePreExecutionEvent and LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow as well. So that we can have single flow to fire these events *[Suggestion]* Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alter table add segment flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
CarbonDataQA1 commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659339600 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3403/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
CarbonDataQA1 commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659338632 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1660/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3849: [WIP] table level timestampformat
ShreelekhyaG opened a new pull request #3849: URL: https://github.com/apache/carbondata/pull/3849 ### Why is this PR needed? To support timestamp format table level. ### What changes were proposed in this PR? Made the priority of timestamp format as: 1) Load command options 2) Table level properties 3) configurable properties (carbon.timestamp.format) ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3787: [WIP] support sort_scope for index creation
ajantha-bhat commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-659312911 @QiangCai : yes, it is in WIP. SI global sort I will support from this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] IceMimosa commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp
IceMimosa commented on pull request #3848: URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659312101 reset please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3787: [WIP] support sort_scope for index creation
QiangCai commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-659312148 during SI loading, it should use this sort_scope. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] IceMimosa opened a new pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp
IceMimosa opened a new pull request #3848: URL: https://github.com/apache/carbondata/pull/3848 ### Why is this PR needed? Loading Data to the partitioned table will update all segments updateDeltaEndTimestamp,that will cause the driver to clear all segments cache when doing the query. ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes TODO This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659311646 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1661/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659309836 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3402/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307892 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
ajantha-bhat commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307713 Add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
ajantha-bhat commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659307068 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
ajantha-bhat edited a comment on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659305724 @QiangCai : Developer should not manually modify pom to make it work for spark2.4. After this PR both 2.4 and 2.5 works without any manual change and jar names also will have binary version. so I fixed like above. But some testcase failed to find CSV file after this change. so, I stopped it. Need to analyze why CSV files unable to find because of my change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
ajantha-bhat commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659305724 @QiangCai : Developer should not manually modify pom to make it work for spark2.4. so I fixed like above. But some testcase failed to find CSV file after this change. so, I stopped it. Need to analyze why CSV files unable to find because of my change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version
QiangCai commented on pull request #3807: URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659304203 finalName is ${artifactId}-${version} by default, this change will not impact artifactId and version. Other modules will can not find the dependency: carbondata-spark_2.3 and carbondata-spark_2.4. actually, if you change spark.binary.version to 2.4 in pom.xml of the parent module, IDEA will work again for spark 2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shunlean opened a new pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
shunlean opened a new pull request #3847: URL: https://github.com/apache/carbondata/pull/3847 ### Why is this PR needed? Only after sorting temp, the write(sortTemp file) operation can run. For better performance, we want to do the writeDataToFile and SortDataRows operations in parallel. ### What changes were proposed in this PR? In (Unsafe)SortDataRows, we add new threads to run write the file operation. About 10% time is reduced with parallel operation in one case. ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file
CarbonDataQA1 commented on pull request #3847: URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659300018 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3906) Optimize sort performance in writting file
bishunli created CARBONDATA-3906: Summary: Optimize sort performance in writting file Key: CARBONDATA-3906 URL: https://issues.apache.org/jira/browse/CARBONDATA-3906 Project: CarbonData Issue Type: Improvement Reporter: bishunli -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3904) insert into data got Failed to create directory path /d
[ https://issues.apache.org/jira/browse/CARBONDATA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158972#comment-17158972 ] Kunal Kapoor commented on CARBONDATA-3904: -- What is the warehouse location? HDFS/S3? > insert into data got Failed to create directory path /d > --- > > Key: CARBONDATA-3904 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3904 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 > Environment: spark-2.4.5 > hadoop 2.7.3 > carbondata2.0.1 >Reporter: XiaoWen >Priority: Minor > > insert data > {code:java} > spark.sql("INSERT OVERWRITE TABLE ods.test_table SELECT * FROM > ods.socol_cmdinfo") > {code} > check logs from spark application on yarn > $ yarn logs -applicationId application_1592787941917_4116 > found a lot this error messages > {code:java} > 20/07/15 16:59:45 ERROR FileFactory: Failed to create directory path /d > 20/07/15 16:59:45 ERROR FileFactory: Failed to create directory path /d > 20/07/15 16:59:51 ERROR FileFactory: Failed to create directory path /d > 20/07/15 16:59:51 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:00 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:00 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:00 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:00 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:00 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:35 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:35 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:00:35 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:02:47 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:02:47 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:02:47 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:03:36 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:03:36 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:09:55 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:09:55 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:10:05 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:10:05 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:10:05 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:11:08 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:11:08 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:11:08 ERROR FileFactory: Failed to create directory path /d > 20/07/15 17:12:45 ERROR FileFactory: Failed to create directory path /d > {code} > {code:java} > core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java > {code} > {code:java} > public static void createDirectoryAndSetPermission(String directoryPath, > FsPermission permission) > throws IOException { > FileFactory.FileType fileType = FileFactory.getFileType(directoryPath); > switch (fileType) { > case S3: > case HDFS: > case ALLUXIO: > case VIEWFS: > case CUSTOM: > case HDFS_LOCAL: > try { > Path path = new Path(directoryPath); > FileSystem fs = path.getFileSystem(getConfiguration()); > if (!fs.exists(path)) { > fs.mkdirs(path); > fs.setPermission(path, permission); > } > } catch (IOException e) { > LOGGER.error("Exception occurred : " + e.getMessage(), e); > throw e; > } > return; > case LOCAL: > default: > directoryPath = FileFactory.getUpdatedFilePath(directoryPath); > File file = new File(directoryPath); > if (!file.mkdirs()) { > LOGGER.error(" Failed to create directory path " + directoryPath); > }} > } > {code} > > I output the variable directoryPath and fileType > {code:java} > if (!file.mkdirs()) { > // check variables > LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" > + fileType.toString() + "]"); > LOGGER.error(" Failed to create directory path " + directoryPath); > } > {code} > add line > LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + > fileType.toString() + "]"); > got echo on yarn logs > 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL] > 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL] > 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL] > 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL] > 2020-07-15 10:48:56
[jira] [Updated] (CARBONDATA-3905) When there are many segment files presto query fail
[ https://issues.apache.org/jira/browse/CARBONDATA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaoWen updated CARBONDATA-3905: Description: test case1 insert data in: {code:java} df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => { ... val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*) target.as("A") .merge(df.as("B"), "A.id = B.id") .whenMatched(cond) .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .whenNotMatched(cond) .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .execute() ... }).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start() {code} a lot of segment files will be generated after a few hours when i try to use presto to query. single condition can be queried, but cannot be queried when there are multiple conditions. select name from test_table // ok select name from test_table where name = 'joe' // ok select name from test_table where name='joe' AND age > 25;// query failed select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully. presto server logs java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62) at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160) at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154) at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115) at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254) at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133) at io.prestosql.operator.Driver.processInternal(Driver.java:379) at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283) at io.prestosql.operator.Driver.tryWithLock(Driver.java:675) at io.prestosql.operator.Driver.processFor(Driver.java:276) at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075) at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484) at io.prestosql.$gen.Presto_31620200623_163219_1.run(Unknown Source) at
[jira] [Created] (CARBONDATA-3905) When there are many segment files presto query fail
XiaoWen created CARBONDATA-3905: --- Summary: When there are many segment files presto query fail Key: CARBONDATA-3905 URL: https://issues.apache.org/jira/browse/CARBONDATA-3905 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 2.0.0 Reporter: XiaoWen test case1 insert data in: df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => { ... val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*) target.as("A") .merge(df.as("B"), "A.id = B.id") .whenMatched(cond) .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .whenNotMatched(cond) .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age")) .execute() ... }).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start() a lot of segment files will be generated after a few hours when i try to use presto to query. single condition can be queried, but cannot be queried when there are multiple conditions. select name from test_table // ok select name from test_table where name = 'joe' // ok select name from test_table where name='joe' AND age > 25;// query failed select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully. presto server logs java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62) at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160) at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154) at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source) at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115) at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254) at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373) at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133) at io.prestosql.operator.Driver.processInternal(Driver.java:379) at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283) at io.prestosql.operator.Driver.tryWithLock(Driver.java:675) at io.prestosql.operator.Driver.processFor(Driver.java:276) at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075) at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484) at