[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-670646477 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3659/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-670645884 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1920/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670622887 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1918/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670617768 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3657/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3887: [WIP] Refactor #3773 and support struct type
ajantha-bhat opened a new pull request #3887: URL: https://github.com/apache/carbondata/pull/3887 This PR dependent on #3773 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095602 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi vector = ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows, true, false); - fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits); + Deque vectorStack = vectorInfo.getVectorStack(); + // Only if vectorStack is null, it is initialized with the parent vector + if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) { +vectorStack = new ArrayDeque<>(); +// pushing the parent vector +vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector()); +vectorInfo.setVectorStack(vectorStack); + } + /* + * if top of vector stack is a complex vector then + * add their children into the stack and load them too. + * TODO: If there are multiple children push them into stack and load them iteratively + */ + if (vectorStack != null && vectorStack.peek().isComplex()) { +vectorStack.peek().setChildrenElements(pageData); Review comment: here, please consider pagesize as argument and break once elements size equals pagesize inside as this buffer is reusable buffer and it can be huge size, not actual size This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095038 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, Bi vector = ColumnarVectorWrapperDirectFactory .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, nullBits, deletedRows, true, false); - fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, vectorInfo, nullBits); + Deque vectorStack = vectorInfo.getVectorStack(); + // Only if vectorStack is null, it is initialized with the parent vector + if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) { +vectorStack = new ArrayDeque<>(); +// pushing the parent vector +vectorStack.push((CarbonColumnVectorImpl) vectorInfo.vector.getColumnVector()); +vectorInfo.setVectorStack(vectorStack); + } + /* + * if top of vector stack is a complex vector then + * add their children into the stack and load them too. + * TODO: If there are multiple children push them into stack and load them iteratively + */ + if (vectorStack != null && vectorStack.peek().isComplex()) { +vectorStack.peek().setChildrenElements(pageData); +vectorStack.push(vectorStack.peek().getChildrenVector().get(0)); +vectorStack.peek().loadPage(); +return; + } + + FillVector fill = new FillVector(pageData, vectorInfo, nullBits); + fill.basedOnType(vector, vectorDataType, pageSize, pageDataType); + Review comment: pop from the stack as child is processed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r467090464 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java ## @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; + +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.util.ByteUtil; + +public class FillVector { + private byte[] pageData; + private float floatFactor = 0; + private double factor = 0; + private ColumnVectorInfo vectorInfo; + private BitSet nullBits; + + public FillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet nullBits) { +this.pageData = pageData; +this.vectorInfo = vectorInfo; +this.nullBits = nullBits; + } + + public void setFactor(double factor) { +this.factor = factor; + } + + public void setFloatFactor(float floatFactor) { +this.floatFactor = floatFactor; + } + + public void basedOnType(CarbonColumnVector vector, DataType vectorDataType, int pageSize, + DataType pageDataType) { +if (vectorInfo.vector.getColumnVector() != null && ((CarbonColumnVectorImpl) vectorInfo.vector +.getColumnVector()).isComplex()) { + fillComplexType(vector.getColumnVector(), pageDataType); +} else { + fillPrimitiveType(vector, vectorDataType, pageSize, pageDataType); + vector.setIndex(0); +} + } + + private void fillComplexType(CarbonColumnVector vector, DataType pageDataType) { +CarbonColumnVectorImpl vectorImpl = (CarbonColumnVectorImpl) vector; +if (vector != null && vector.getChildrenVector() != null) { + ArrayList childElements = ((CarbonColumnVectorImpl) vector).getChildrenElements(); + for (int i = 0; i < childElements.size(); i++) { +int count = childElements.get(i); +typeComplexObject(vectorImpl.getChildrenVector().get(0), count, pageDataType); +vector.putArrayObject(); + } +} Review comment: reset the index of child vector as this page is processed here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3946) Support IndexServer with Presto Engine
Indhumathi Muthumurugesh created CARBONDATA-3946: Summary: Support IndexServer with Presto Engine Key: CARBONDATA-3946 URL: https://issues.apache.org/jira/browse/CARBONDATA-3946 Project: CarbonData Issue Type: New Feature Reporter: Indhumathi Muthumurugesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading
CarbonDataQA1 commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670515677 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3656/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading
CarbonDataQA1 commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670508231 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1917/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…
CarbonDataQA1 commented on pull request #3886: URL: https://github.com/apache/carbondata/pull/3886#issuecomment-67050 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1915/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
CarbonDataQA1 commented on pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#issuecomment-670503871 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1916/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
CarbonDataQA1 commented on pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#issuecomment-670502706 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3655/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…
CarbonDataQA1 commented on pull request #3886: URL: https://github.com/apache/carbondata/pull/3886#issuecomment-670501876 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3654/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
asfgit closed pull request #3882: URL: https://github.com/apache/carbondata/pull/3882 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support
CarbonDataQA1 commented on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-670494726 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
QiangCai commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670493189 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466964195 ## File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.readers; + +import java.util.ArrayList; +import java.util.List; + +import io.prestosql.spi.type.*; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; + +import io.prestosql.spi.block.Block; +import io.prestosql.spi.block.BlockBuilder; + +import org.apache.carbondata.presto.CarbonVectorBatch; + +/** + * Class to read the Array Stream + */ + +public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder { + + protected int batchSize; + + protected Type type; + protected BlockBuilder builder; + Block childBlock = null; + private int index = 0; + + public ArrayStreamReader(int batchSize, DataType dataType, StructField field) { +super(batchSize, dataType); +this.batchSize = batchSize; +this.type = getArrayOfType(field, dataType); +ArrayList childrenList= new ArrayList<>(); + childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field)); +setChildrenVector(childrenList); +this.builder = type.createBlockBuilder(null, batchSize); + } + + public int getIndex() { +return index; + } + + public void setIndex(int index) { +this.index = index; + } + + public String getDataTypeName() { +return "ARRAY"; + } + + Type getArrayOfType(StructField field, DataType dataType) { +if (dataType == DataTypes.STRING) { + return new ArrayType(VarcharType.VARCHAR); +} else if (dataType == DataTypes.BYTE) { + return new ArrayType(TinyintType.TINYINT); +} else if (dataType == DataTypes.SHORT) { + return new ArrayType(SmallintType.SMALLINT); +} else if (dataType == DataTypes.INT) { + return new ArrayType(IntegerType.INTEGER); +} else if (dataType == DataTypes.LONG) { + return new ArrayType(BigintType.BIGINT); +} else if (dataType == DataTypes.DOUBLE) { + return new ArrayType(DoubleType.DOUBLE); +} else if (dataType == DataTypes.FLOAT) { + return new ArrayType(RealType.REAL); +} else if (dataType == DataTypes.BOOLEAN) { + return new ArrayType(BooleanType.BOOLEAN); +} else if (dataType == DataTypes.TIMESTAMP) { + return new ArrayType(TimestampType.TIMESTAMP); +} else if (DataTypes.isArrayType(dataType)) { + StructField childField = field.getChildren().get(0); + return new ArrayType(getArrayOfType(childField, childField.getDataType())); +} else { + throw new UnsupportedOperationException("Unsupported type: " + dataType); +} + } + + @Override + public Block buildBlock() { +return builder.build(); + } + + public boolean isComplex() { +return true; + } + + @Override + public void setBatchSize(int batchSize) { +this.batchSize = batchSize; + } + + @Override + public void putObject(int rowId, Object value) { +if (value == null) { + putNull(rowId); +} else { + getChildrenVector().get(0).putObject(rowId, value); +} + } + + public void putArrayObject() { +if (DataTypes.isArrayType(this.getType())) { + childBlock = ((ArrayStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.STRING) { + childBlock = ((SliceStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.INT) { + childBlock = ((IntegerStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.LONG) { + childBlock = ((LongStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.DOUBLE) { + childBlock = ((DoubleStreamReader) getChildrenVector().get(0)).buildBlock();
[GitHub] [carbondata] marchpure commented on a change in pull request #3881: [CARBONDATA-3945] NPE While Data Loading
marchpure commented on a change in pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#discussion_r466962667 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ## @@ -207,7 +207,7 @@ case class CarbonRelation( null != validSeg.getLoadMetadataDetails.getIndexSize) { size = size + validSeg.getLoadMetadataDetails.getDataSize.toLong + validSeg.getLoadMetadataDetails.getIndexSize.toLong - } else { + } else if (!carbonTable.isHivePartitionTable) { Review comment: modified ## File path: core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java ## @@ -87,7 +87,9 @@ public TableStatusReadCommittedScope(AbsoluteTableIdentifier identifier, SegmentFileStore fileStore = new SegmentFileStore(identifier.getTablePath(), segment.getSegmentFileName()); indexFiles = fileStore.getIndexOrMergeFiles(); - segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo()); + if (fileStore != null && fileStore.getSegmentFile() != null) { Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading
marchpure commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670454261 issue created This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3945) NPE while Data Loading
Xingjun Hao created CARBONDATA-3945: --- Summary: NPE while Data Loading Key: CARBONDATA-3945 URL: https://issues.apache.org/jira/browse/CARBONDATA-3945 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao # getLastModifiedTime of LoadMetadataDetails fails due to "updateDeltaEndTimestamp is empty string". # In the getCommittedIndexFile founction, NPE happens because of "segmentfile is null" under the Unusual cases. # Cleaning temp files fails because of "partitionInfo is null" under the unusual cases. # When calculating sizeInBytes of CarbonRelation, under the unusual cases, it need to collect the directory size. but the directory path only works for non-partition tables, for partition tables, filenotfoundexcepiton was throwed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure commented on a change in pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
marchpure commented on a change in pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#discussion_r466957812 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoHadoopFsRelationCommand.scala ## @@ -104,11 +104,13 @@ case class CarbonInsertIntoHadoopFsRelationCommand( val dynamicPartitionOverwrite = enableDynamicOverwrite && mode == SaveMode.Overwrite && staticPartitions.size < partitionColumns.length -val committer = FileCommitProtocol.instantiate( - sparkSession.sessionState.conf.fileCommitProtocolClass, - jobId = java.util.UUID.randomUUID().toString, - outputPath = outputPath.toString, - dynamicPartitionOverwrite = dynamicPartitionOverwrite) +val committer = fileFormat match { Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…
marchpure opened a new pull request #3886: URL: https://github.com/apache/carbondata/pull/3886 … happen ### Why is this PR needed? In the insertstage flow, the stage files will be deleted with retry mechanism. but then IOException happen due to network abnormal etc, the delete stage flow will be interrupted, which is unexpected. ### What changes were proposed in this PR? When catch exception during deleting stages files, Continue to retry. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670444134 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1913/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3944) Delete stage files was interrupted when IOException happen
Xingjun Hao created CARBONDATA-3944: --- Summary: Delete stage files was interrupted when IOException happen Key: CARBONDATA-3944 URL: https://issues.apache.org/jira/browse/CARBONDATA-3944 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao In the insertstage flow, the stage files will be deleted with retry mechanism. but then IOException happen due to network abnormal etc, the delete stage flow will be interrupted, which is unexpected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670443439 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3652/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927976 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoReadTableFilesTest.scala ## @@ -0,0 +1,443 @@ +package org.apache.carbondata.presto.integrationtest + +import java.io.File +import java.sql.{SQLException, Timestamp} +import java.util +import java.util.Arrays.asList + +import io.prestosql.jdbc.PrestoArray +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.{DataTypes, Field} +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil} +import org.apache.carbondata.presto.server.PrestoServer +import org.apache.carbondata.sdk.file.{CarbonWriter, Schema} +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.apache.spark.sql.Row +import org.scalatest.{BeforeAndAfterAll, FunSuiteLike, BeforeAndAfterEach} + +import scala.collection.mutable +import scala.collection.JavaConverters._ +class PrestoReadTableFilesTest extends FunSuiteLike with BeforeAndAfterAll with BeforeAndAfterEach{ + private val logger = LogServiceFactory + .getLogService(classOf[PrestoTestNonTransactionalTableFiles].getCanonicalName) + + private val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath + private val storePath = s"$rootPath/integration/presto/target/store" + private val systemPath = s"$rootPath/integration/presto/target/system" + private var writerPath = storePath + "/sdk_output/files" + private val prestoServer = new PrestoServer + private var varcharString = new String + + override def beforeAll: Unit = { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, + "Presto") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, + "Presto") +val map = new util.HashMap[String, String]() +map.put("hive.metastore", "file") +map.put("hive.metastore.catalog.dir", s"file://$storePath") + +prestoServer.startServer("sdk_output", map) + } + + override def afterAll(): Unit = { +prestoServer.stopServer() +CarbonUtil.deleteFoldersAndFiles(FileFactory.getCarbonFile(storePath)) + } + + private def createComplexTableForSingleLevelArray = { +prestoServer.execute("drop table if exists sdk_output.files") +prestoServer.execute("drop schema if exists sdk_output") +prestoServer.execute("create schema sdk_output") +prestoServer + .execute( +"create table sdk_output.files(stringCol varchar, intCol int, doubleCol double, realCol real, boolCol boolean, arrayStringCol1 array(varchar), arrayStringcol2 array(varchar), arrayIntCol array(int), arrayBigIntCol array(bigint), arrayRealCol array(real), arrayDoubleCol array(double), arrayBooleanCol array(boolean)) with(format='CARBON') ") + } + + private def createComplexTableFor2LevelArray = { +prestoServer.execute("drop table if exists sdk_output.files2") +prestoServer.execute("drop schema if exists sdk_output") +prestoServer.execute("create schema sdk_output") +prestoServer + .execute( +"create table sdk_output.files2(arrayArrayInt array(array(int)), arrayArrayBigInt array(array(bigint)), arrayArrayReal array(array(real)), arrayArrayDouble array(array(double)), arrayArrayString array(array(varchar)), arrayArrayBoolean array(array(boolean))) with(format='CARBON') ") + } + + private def createComplexTableFor3LevelArray = { +prestoServer.execute("drop table if exists sdk_output.files3") +prestoServer.execute("drop schema if exists sdk_output") +prestoServer.execute("create schema sdk_output") +prestoServer +.execute( + "create table sdk_output.files3(array3_Int array(array(array(int))), array3_BigInt array(array(array(bigint))), array3_Real array(array(array(real))), array3_Double array(array(array(double))), array3_String array(array(array(varchar))), array3_Boolean array(array(array(boolean))) ) with(format='CARBON') ") +} + + def buildComplexTestForSingleLevelArray(): Any = { +FileUtils.deleteDirectory(new File(writerPath)) +createComplexTableForSingleLevelArray +import java.io.IOException +val source = new File(this.getClass.getResource("/").getPath + "../../" + "/temp/table1").getCanonicalPath +val srcDir = new File(source) +val destination = new File(this.getClass.getResource("/").getPath + "../../" + "/target/store/sdk_output/files/").getCanonicalPath +val destDir = new File(destination) +try FileUtils.copyDirectory(srcDir, destDir) +catch { + case e: IOException => +
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927654 ## File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.readers; + +import java.util.ArrayList; +import java.util.List; + +import io.prestosql.spi.type.*; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; + +import io.prestosql.spi.block.Block; +import io.prestosql.spi.block.BlockBuilder; + +import org.apache.carbondata.presto.CarbonVectorBatch; + +/** + * Class to read the Array Stream + */ + +public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder { + + protected int batchSize; + + protected Type type; + protected BlockBuilder builder; + Block childBlock = null; + private int index = 0; + + public ArrayStreamReader(int batchSize, DataType dataType, StructField field) { +super(batchSize, dataType); +this.batchSize = batchSize; +this.type = getArrayOfType(field, dataType); +ArrayList childrenList= new ArrayList<>(); + childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field)); +setChildrenVector(childrenList); +this.builder = type.createBlockBuilder(null, batchSize); + } + + public int getIndex() { +return index; + } + + public void setIndex(int index) { +this.index = index; + } + + public String getDataTypeName() { +return "ARRAY"; + } + + Type getArrayOfType(StructField field, DataType dataType) { +if (dataType == DataTypes.STRING) { + return new ArrayType(VarcharType.VARCHAR); +} else if (dataType == DataTypes.BYTE) { + return new ArrayType(TinyintType.TINYINT); +} else if (dataType == DataTypes.SHORT) { + return new ArrayType(SmallintType.SMALLINT); +} else if (dataType == DataTypes.INT) { + return new ArrayType(IntegerType.INTEGER); +} else if (dataType == DataTypes.LONG) { + return new ArrayType(BigintType.BIGINT); +} else if (dataType == DataTypes.DOUBLE) { + return new ArrayType(DoubleType.DOUBLE); +} else if (dataType == DataTypes.FLOAT) { + return new ArrayType(RealType.REAL); +} else if (dataType == DataTypes.BOOLEAN) { + return new ArrayType(BooleanType.BOOLEAN); +} else if (dataType == DataTypes.TIMESTAMP) { + return new ArrayType(TimestampType.TIMESTAMP); +} else if (DataTypes.isArrayType(dataType)) { + StructField childField = field.getChildren().get(0); + return new ArrayType(getArrayOfType(childField, childField.getDataType())); +} else { + throw new UnsupportedOperationException("Unsupported type: " + dataType); +} + } + + @Override + public Block buildBlock() { +return builder.build(); + } + + public boolean isComplex() { +return true; + } + + @Override + public void setBatchSize(int batchSize) { +this.batchSize = batchSize; + } + + @Override + public void putObject(int rowId, Object value) { +if (value == null) { Review comment: putObject is used only by the primitive type. Once entire row is put, using putArrayObject() to put that into array. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670417657 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3650/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670416939 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1911/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670369483 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1909/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466858210 ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java ## @@ -102,6 +126,58 @@ public CarbonColumnVectorImpl(int batchSize, DataType dataType) { } + @Override + public CarbonColumnVector getColumnVector() { +return null; + } + + @Override + public List getChildrenVector() { +return childrenVector; + } + + @Override + public void putArrayObject() { +return; + } + + public void setChildrenVector(ArrayList childrenVector) { +this.childrenVector = childrenVector; + } + + public ArrayList getChildrenElements() { +return childrenElements; + } + + public void setChildrenElements(ArrayList childrenElements) { +this.childrenElements = childrenElements; + } + + public ArrayList getChildrenOffset() { +return childrenOffset; + } + + public void setChildrenOffset(ArrayList childrenOffset) { +this.childrenOffset = childrenOffset; + } + + public void setChildrenElementsAndOffset(byte[] childPageData) { +ByteBuffer childInfoBuffer = ByteBuffer.wrap(childPageData); +ArrayList childElements = new ArrayList<>(); +ArrayList childOffset = new ArrayList<>(); Review comment: okay, removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857970 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "string" +
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670362767 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3648/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org