[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-670646477


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3659/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-670645884


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1920/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670622887


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1918/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670617768


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3657/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-07 Thread GitBox


ajantha-bhat opened a new pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887


   This PR dependent on #3773 
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095602



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, 
ColumnVectorInfo vectorInfo, Bi
   vector = ColumnarVectorWrapperDirectFactory
   .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, 
nullBits, deletedRows,
   true, false);
-  fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, 
vectorInfo, nullBits);
+  Deque vectorStack = vectorInfo.getVectorStack();
+  // Only if vectorStack is null, it is initialized with the parent vector
+  if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+vectorStack = new ArrayDeque<>();
+// pushing the parent vector
+vectorStack.push((CarbonColumnVectorImpl) 
vectorInfo.vector.getColumnVector());
+vectorInfo.setVectorStack(vectorStack);
+  }
+  /*
+   * if top of vector stack is a complex vector then
+   * add their children into the stack and load them too.
+   * TODO: If there are multiple children push them into stack and load 
them iteratively
+   */
+  if (vectorStack != null && vectorStack.peek().isComplex()) {
+vectorStack.peek().setChildrenElements(pageData);

Review comment:
   here, please consider pagesize as argument and break once elements size 
equals pagesize inside as this buffer is reusable buffer and it can be huge 
size, not actual size





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467095038



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -246,7 +239,29 @@ public void decodeAndFillVector(byte[] pageData, 
ColumnVectorInfo vectorInfo, Bi
   vector = ColumnarVectorWrapperDirectFactory
   .getDirectVectorWrapperFactory(vector, vectorInfo.invertedIndex, 
nullBits, deletedRows,
   true, false);
-  fillVector(pageData, vector, vectorDataType, pageDataType, pageSize, 
vectorInfo, nullBits);
+  Deque vectorStack = vectorInfo.getVectorStack();
+  // Only if vectorStack is null, it is initialized with the parent vector
+  if (vectorStack == null && vectorInfo.vector.getColumnVector() != null) {
+vectorStack = new ArrayDeque<>();
+// pushing the parent vector
+vectorStack.push((CarbonColumnVectorImpl) 
vectorInfo.vector.getColumnVector());
+vectorInfo.setVectorStack(vectorStack);
+  }
+  /*
+   * if top of vector stack is a complex vector then
+   * add their children into the stack and load them too.
+   * TODO: If there are multiple children push them into stack and load 
them iteratively
+   */
+  if (vectorStack != null && vectorStack.peek().isComplex()) {
+vectorStack.peek().setChildrenElements(pageData);
+vectorStack.push(vectorStack.peek().getChildrenVector().get(0));
+vectorStack.peek().loadPage();
+return;
+  }
+
+  FillVector fill = new FillVector(pageData, vectorInfo, nullBits);
+  fill.basedOnType(vector, vectorDataType, pageSize, pageDataType);
+

Review comment:
   pop from the stack as child is processed 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r467090464



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/FillVector.java
##
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.encoding;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+
+public class FillVector {
+  private byte[] pageData;
+  private float floatFactor = 0;
+  private double factor = 0;
+  private ColumnVectorInfo vectorInfo;
+  private BitSet nullBits;
+
+  public FillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet 
nullBits) {
+this.pageData = pageData;
+this.vectorInfo = vectorInfo;
+this.nullBits = nullBits;
+  }
+
+  public void setFactor(double factor) {
+this.factor = factor;
+  }
+
+  public void setFloatFactor(float floatFactor) {
+this.floatFactor = floatFactor;
+  }
+
+  public void basedOnType(CarbonColumnVector vector, DataType vectorDataType, 
int pageSize,
+  DataType pageDataType) {
+if (vectorInfo.vector.getColumnVector() != null && 
((CarbonColumnVectorImpl) vectorInfo.vector
+.getColumnVector()).isComplex()) {
+  fillComplexType(vector.getColumnVector(), pageDataType);
+} else {
+  fillPrimitiveType(vector, vectorDataType, pageSize, pageDataType);
+  vector.setIndex(0);
+}
+  }
+
+  private void fillComplexType(CarbonColumnVector vector, DataType 
pageDataType) {
+CarbonColumnVectorImpl vectorImpl = (CarbonColumnVectorImpl) vector;
+if (vector != null && vector.getChildrenVector() != null) {
+  ArrayList childElements = ((CarbonColumnVectorImpl) 
vector).getChildrenElements();
+  for (int i = 0; i < childElements.size(); i++) {
+int count = childElements.get(i);
+typeComplexObject(vectorImpl.getChildrenVector().get(0), count, 
pageDataType);
+vector.putArrayObject();
+  }
+}

Review comment:
   reset the index of child vector as this page is processed here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3946) Support IndexServer with Presto Engine

2020-08-07 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3946:


 Summary: Support IndexServer with Presto Engine
 Key: CARBONDATA-3946
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3946
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3881:
URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670515677


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3656/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3881:
URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670508231


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1917/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3886:
URL: https://github.com/apache/carbondata/pull/3886#issuecomment-67050


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1915/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3883:
URL: https://github.com/apache/carbondata/pull/3883#issuecomment-670503871


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1916/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3883:
URL: https://github.com/apache/carbondata/pull/3883#issuecomment-670502706


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3655/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3886:
URL: https://github.com/apache/carbondata/pull/3886#issuecomment-670501876


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3654/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto

2020-08-07 Thread GitBox


asfgit closed pull request #3882:
URL: https://github.com/apache/carbondata/pull/3882


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-670494726







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto

2020-08-07 Thread GitBox


QiangCai commented on pull request #3882:
URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670493189


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466964195



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements 
PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField 
field) {
+super(batchSize, dataType);
+this.batchSize = batchSize;
+this.type = getArrayOfType(field, dataType);
+ArrayList childrenList= new ArrayList<>();
+
childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, 
field.getDataType(), field));
+setChildrenVector(childrenList);
+this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+return index;
+  }
+
+  public void setIndex(int index) {
+this.index = index;
+  }
+
+  public String getDataTypeName() {
+return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+if (dataType == DataTypes.STRING) {
+  return new ArrayType(VarcharType.VARCHAR);
+} else if (dataType == DataTypes.BYTE) {
+  return new ArrayType(TinyintType.TINYINT);
+} else if (dataType == DataTypes.SHORT) {
+  return new ArrayType(SmallintType.SMALLINT);
+} else if (dataType == DataTypes.INT) {
+  return new ArrayType(IntegerType.INTEGER);
+} else if (dataType == DataTypes.LONG) {
+  return new ArrayType(BigintType.BIGINT);
+} else if (dataType == DataTypes.DOUBLE) {
+  return new ArrayType(DoubleType.DOUBLE);
+} else if (dataType == DataTypes.FLOAT) {
+  return new ArrayType(RealType.REAL);
+} else if (dataType == DataTypes.BOOLEAN) {
+  return new ArrayType(BooleanType.BOOLEAN);
+} else if (dataType == DataTypes.TIMESTAMP) {
+  return new ArrayType(TimestampType.TIMESTAMP);
+} else if (DataTypes.isArrayType(dataType)) {
+  StructField childField = field.getChildren().get(0);
+  return new ArrayType(getArrayOfType(childField, 
childField.getDataType()));
+} else {
+  throw new UnsupportedOperationException("Unsupported type: " + dataType);
+}
+  }
+
+  @Override
+  public Block buildBlock() {
+return builder.build();
+  }
+
+  public boolean isComplex() {
+return true;
+  }
+
+  @Override
+  public void setBatchSize(int batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @Override
+  public void putObject(int rowId, Object value) {
+if (value == null) {
+  putNull(rowId);
+} else {
+  getChildrenVector().get(0).putObject(rowId, value);
+}
+  }
+
+  public void putArrayObject() {
+if (DataTypes.isArrayType(this.getType())) {
+  childBlock = ((ArrayStreamReader) 
getChildrenVector().get(0)).buildBlock();
+} else if (this.getType() == DataTypes.STRING) {
+  childBlock = ((SliceStreamReader) 
getChildrenVector().get(0)).buildBlock();
+} else if (this.getType() == DataTypes.INT) {
+  childBlock = ((IntegerStreamReader) 
getChildrenVector().get(0)).buildBlock();
+} else if (this.getType() == DataTypes.LONG) {
+  childBlock = ((LongStreamReader) 
getChildrenVector().get(0)).buildBlock();
+} else if (this.getType() == DataTypes.DOUBLE) {
+  childBlock = ((DoubleStreamReader) 

[GitHub] [carbondata] marchpure commented on a change in pull request #3881: [CARBONDATA-3945] NPE While Data Loading

2020-08-07 Thread GitBox


marchpure commented on a change in pull request #3881:
URL: https://github.com/apache/carbondata/pull/3881#discussion_r466962667



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
##
@@ -207,7 +207,7 @@ case class CarbonRelation(
   null != validSeg.getLoadMetadataDetails.getIndexSize) {
 size = size + 
validSeg.getLoadMetadataDetails.getDataSize.toLong +
validSeg.getLoadMetadataDetails.getIndexSize.toLong
-  } else {
+  } else if (!carbonTable.isHivePartitionTable) {

Review comment:
   modified

##
File path: 
core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java
##
@@ -87,7 +87,9 @@ public TableStatusReadCommittedScope(AbsoluteTableIdentifier 
identifier,
   SegmentFileStore fileStore =
   new SegmentFileStore(identifier.getTablePath(), 
segment.getSegmentFileName());
   indexFiles = fileStore.getIndexOrMergeFiles();
-  
segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo());
+  if (fileStore != null && fileStore.getSegmentFile() != null) {

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3881: [CARBONDATA-3945] NPE While Data Loading

2020-08-07 Thread GitBox


marchpure commented on pull request #3881:
URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670454261


   issue created



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3945) NPE while Data Loading

2020-08-07 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3945:
---

 Summary: NPE while Data Loading 
 Key: CARBONDATA-3945
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3945
 Project: CarbonData
  Issue Type: Bug
Reporter: Xingjun Hao


# getLastModifiedTime of LoadMetadataDetails fails due to 
"updateDeltaEndTimestamp is empty string".
 # In the getCommittedIndexFile founction, NPE happens because of "segmentfile 
is null" under the Unusual cases.
 # Cleaning temp files fails because of "partitionInfo is null" under the 
unusual cases.
 # When calculating sizeInBytes of CarbonRelation, under the unusual cases, it 
need to collect the directory size. but the directory path only works for 
non-partition tables, for partition tables, filenotfoundexcepiton was throwed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] marchpure commented on a change in pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…

2020-08-07 Thread GitBox


marchpure commented on a change in pull request #3883:
URL: https://github.com/apache/carbondata/pull/3883#discussion_r466957812



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoHadoopFsRelationCommand.scala
##
@@ -104,11 +104,13 @@ case class CarbonInsertIntoHadoopFsRelationCommand(
 val dynamicPartitionOverwrite = enableDynamicOverwrite && mode == 
SaveMode.Overwrite &&
 staticPartitions.size < 
partitionColumns.length
 
-val committer = FileCommitProtocol.instantiate(
-  sparkSession.sessionState.conf.fileCommitProtocolClass,
-  jobId = java.util.UUID.randomUUID().toString,
-  outputPath = outputPath.toString,
-  dynamicPartitionOverwrite = dynamicPartitionOverwrite)
+val committer = fileFormat match {

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3886: [CARBONDATA-3944] Delete stage files was interrupted when IOException…

2020-08-07 Thread GitBox


marchpure opened a new pull request #3886:
URL: https://github.com/apache/carbondata/pull/3886


   … happen
   
### Why is this PR needed?
In the insertstage flow, the stage files will be deleted with retry 
mechanism. but then IOException happen due to network abnormal etc, the delete 
stage flow will be interrupted, which is unexpected.

### What changes were proposed in this PR?
   When catch exception during deleting stages files, Continue to retry.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670444134


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1913/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3944) Delete stage files was interrupted when IOException happen

2020-08-07 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3944:
---

 Summary: Delete stage files was interrupted when IOException happen
 Key: CARBONDATA-3944
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3944
 Project: CarbonData
  Issue Type: Bug
Reporter: Xingjun Hao


In the insertstage flow, the stage files will be deleted with retry mechanism. 
but then IOException happen due to network abnormal etc, the delete stage flow 
will be interrupted, which is unexpected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670443439


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3652/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927654



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements 
PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField 
field) {
+super(batchSize, dataType);
+this.batchSize = batchSize;
+this.type = getArrayOfType(field, dataType);
+ArrayList childrenList= new ArrayList<>();
+
childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, 
field.getDataType(), field));
+setChildrenVector(childrenList);
+this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+return index;
+  }
+
+  public void setIndex(int index) {
+this.index = index;
+  }
+
+  public String getDataTypeName() {
+return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+if (dataType == DataTypes.STRING) {
+  return new ArrayType(VarcharType.VARCHAR);
+} else if (dataType == DataTypes.BYTE) {
+  return new ArrayType(TinyintType.TINYINT);
+} else if (dataType == DataTypes.SHORT) {
+  return new ArrayType(SmallintType.SMALLINT);
+} else if (dataType == DataTypes.INT) {
+  return new ArrayType(IntegerType.INTEGER);
+} else if (dataType == DataTypes.LONG) {
+  return new ArrayType(BigintType.BIGINT);
+} else if (dataType == DataTypes.DOUBLE) {
+  return new ArrayType(DoubleType.DOUBLE);
+} else if (dataType == DataTypes.FLOAT) {
+  return new ArrayType(RealType.REAL);
+} else if (dataType == DataTypes.BOOLEAN) {
+  return new ArrayType(BooleanType.BOOLEAN);
+} else if (dataType == DataTypes.TIMESTAMP) {
+  return new ArrayType(TimestampType.TIMESTAMP);
+} else if (DataTypes.isArrayType(dataType)) {
+  StructField childField = field.getChildren().get(0);
+  return new ArrayType(getArrayOfType(childField, 
childField.getDataType()));
+} else {
+  throw new UnsupportedOperationException("Unsupported type: " + dataType);
+}
+  }
+
+  @Override
+  public Block buildBlock() {
+return builder.build();
+  }
+
+  public boolean isComplex() {
+return true;
+  }
+
+  @Override
+  public void setBatchSize(int batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @Override
+  public void putObject(int rowId, Object value) {
+if (value == null) {

Review comment:
   putObject is used only by the primitive type. Once entire row is put, 
using putArrayObject() to put that into array.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466927976



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoReadTableFilesTest.scala
##
@@ -0,0 +1,443 @@
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.File
+import java.sql.{SQLException, Timestamp}
+import java.util
+import java.util.Arrays.asList
+
+import io.prestosql.jdbc.PrestoArray
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.{DataTypes, Field}
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.presto.server.PrestoServer
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.apache.spark.sql.Row
+import org.scalatest.{BeforeAndAfterAll, FunSuiteLike, BeforeAndAfterEach}
+
+import scala.collection.mutable
+import scala.collection.JavaConverters._
+class PrestoReadTableFilesTest extends FunSuiteLike with BeforeAndAfterAll 
with BeforeAndAfterEach{
+  private val logger = LogServiceFactory
+
.getLogService(classOf[PrestoTestNonTransactionalTableFiles].getCanonicalName)
+
+  private val rootPath = new File(this.getClass.getResource("/").getPath
++ "../../../..").getCanonicalPath
+  private val storePath = s"$rootPath/integration/presto/target/store"
+  private val systemPath = s"$rootPath/integration/presto/target/system"
+  private var writerPath = storePath + "/sdk_output/files"
+  private val prestoServer = new PrestoServer
+  private var varcharString = new String
+
+  override def beforeAll: Unit = {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+  "Presto")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+  "Presto")
+val map = new util.HashMap[String, String]()
+map.put("hive.metastore", "file")
+map.put("hive.metastore.catalog.dir", s"file://$storePath")
+
+prestoServer.startServer("sdk_output", map)
+  }
+
+  override def afterAll(): Unit = {
+prestoServer.stopServer()
+CarbonUtil.deleteFoldersAndFiles(FileFactory.getCarbonFile(storePath))
+  }
+
+  private def createComplexTableForSingleLevelArray = {
+prestoServer.execute("drop table if exists sdk_output.files")
+prestoServer.execute("drop schema if exists sdk_output")
+prestoServer.execute("create schema sdk_output")
+prestoServer
+  .execute(
+"create table sdk_output.files(stringCol varchar, intCol int, 
doubleCol double, realCol real, boolCol boolean, arrayStringCol1 
array(varchar), arrayStringcol2 array(varchar), arrayIntCol array(int), 
arrayBigIntCol array(bigint), arrayRealCol array(real), arrayDoubleCol 
array(double), arrayBooleanCol array(boolean)) with(format='CARBON') ")
+  }
+
+  private def createComplexTableFor2LevelArray = {
+prestoServer.execute("drop table if exists sdk_output.files2")
+prestoServer.execute("drop schema if exists sdk_output")
+prestoServer.execute("create schema sdk_output")
+prestoServer
+  .execute(
+"create table sdk_output.files2(arrayArrayInt array(array(int)), 
arrayArrayBigInt array(array(bigint)), arrayArrayReal array(array(real)), 
arrayArrayDouble array(array(double)), arrayArrayString array(array(varchar)), 
arrayArrayBoolean array(array(boolean))) with(format='CARBON') ")
+  }
+
+  private def createComplexTableFor3LevelArray = {
+prestoServer.execute("drop table if exists sdk_output.files3")
+prestoServer.execute("drop schema if exists sdk_output")
+prestoServer.execute("create schema sdk_output")
+prestoServer
+.execute(
+  "create table sdk_output.files3(array3_Int array(array(array(int))), 
array3_BigInt array(array(array(bigint))), array3_Real 
array(array(array(real))), array3_Double array(array(array(double))), 
array3_String array(array(array(varchar))), array3_Boolean 
array(array(array(boolean))) ) with(format='CARBON') ")
+}
+
+  def buildComplexTestForSingleLevelArray(): Any = {
+FileUtils.deleteDirectory(new File(writerPath))
+createComplexTableForSingleLevelArray
+import java.io.IOException
+val source = new File(this.getClass.getResource("/").getPath + "../../" + 
"/temp/table1").getCanonicalPath
+val srcDir = new File(source)
+val destination = new File(this.getClass.getResource("/").getPath + 
"../../" + "/target/store/sdk_output/files/").getCanonicalPath
+val destDir = new File(destination)
+try FileUtils.copyDirectory(srcDir, destDir)
+catch {
+  case e: IOException =>
+

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670417657


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3650/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670416939


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1911/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670369483


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1909/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466858210



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
##
@@ -102,6 +126,58 @@ public CarbonColumnVectorImpl(int batchSize, DataType 
dataType) {
 
   }
 
+  @Override
+  public CarbonColumnVector getColumnVector() {
+return null;
+  }
+
+  @Override
+  public List getChildrenVector() {
+return childrenVector;
+  }
+
+  @Override
+  public void putArrayObject() {
+return;
+  }
+
+  public void setChildrenVector(ArrayList 
childrenVector) {
+this.childrenVector = childrenVector;
+  }
+
+  public ArrayList getChildrenElements() {
+return childrenElements;
+  }
+
+  public void setChildrenElements(ArrayList childrenElements) {
+this.childrenElements = childrenElements;
+  }
+
+  public ArrayList getChildrenOffset() {
+return childrenOffset;
+  }
+
+  public void setChildrenOffset(ArrayList childrenOffset) {
+this.childrenOffset = childrenOffset;
+  }
+
+  public void setChildrenElementsAndOffset(byte[] childPageData) {
+ByteBuffer childInfoBuffer = ByteBuffer.wrap(childPageData);
+ArrayList childElements = new ArrayList<>();
+ArrayList childOffset = new ArrayList<>();

Review comment:
   okay, removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857970



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, 
File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, 
GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import 
org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import 
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, 
DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+val json1: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json2: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+|"arrayBooleanCol": [true, true, true]} """.stripMargin
+val json3: String =
+  """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+|"arrayStringCol2": ["China", "Brazil", "Paris", 
"France"],"arrayIntCol": [1,2,3,4,5],
+
|"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45],
+|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, 
false, true]} """
+.stripMargin
+val json4: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json5: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+|"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+val mySchema =
+  """ {
+|  "name": "address",
+|  "type": "record",
+|  "fields": [
+|  {
+|  "name": "stringCol",
+|  "type": "string"

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver

2020-08-07 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670362767


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3648/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857547



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, 
File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, 
GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import 
org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import 
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, 
DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+val json1: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json2: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+|"arrayBooleanCol": [true, true, true]} """.stripMargin
+val json3: String =
+  """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+|"arrayStringCol2": ["China", "Brazil", "Paris", 
"France"],"arrayIntCol": [1,2,3,4,5],
+
|"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45],
+|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, 
false, true]} """
+.stripMargin
+val json4: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json5: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+|"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+val mySchema =
+  """ {
+|  "name": "address",
+|  "type": "record",
+|  "fields": [
+|  {
+|  "name": "stringCol",
+|  "type": "string"

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857370



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, 
File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, 
GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import 
org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import 
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, 
DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+val json1: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json2: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+|"arrayBooleanCol": [true, true, true]} """.stripMargin
+val json3: String =
+  """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+|"arrayStringCol2": ["China", "Brazil", "Paris", 
"France"],"arrayIntCol": [1,2,3,4,5],
+
|"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45],
+|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, 
false, true]} """
+.stripMargin
+val json4: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json5: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+|"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+val mySchema =
+  """ {
+|  "name": "address",
+|  "type": "record",
+|  "fields": [
+|  {
+|  "name": "stringCol",
+|  "type": "string"

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-07 Thread GitBox


akkio-97 commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857297



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala
##
@@ -0,0 +1,667 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.integrationtest
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, 
File, InputStream}
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, 
GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.junit.Assert
+
+import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.block.TableBlockInfo
+import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
+import 
org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory
+import 
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3
+import org.apache.carbondata.core.datastore.compression.CompressorFactory
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion
+import org.apache.carbondata.core.util.{CarbonMetadataUtil, 
DataFileFooterConverterV3}
+import org.apache.carbondata.sdk.file.CarbonWriter
+
+class GenerateFiles {
+
+  def singleLevelArrayFile() = {
+val json1: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json2: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3],
+|"arrayBooleanCol": [true, true, true]} """.stripMargin
+val json3: String =
+  """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"],
+|"arrayStringCol2": ["China", "Brazil", "Paris", 
"France"],"arrayIntCol": [1,2,3,4,5],
+
|"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45],
+|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, 
false, true]} """
+.stripMargin
+val json4: String =
+  """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7,
+|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": 
["India", "Egypt"],
+|"arrayIntCol": 
[1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2],
+|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, 
true]} """.stripMargin
+val json5: String =
+  """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7,
+|"boolCol": true, "arrayStringCol1": ["Street1", 
"Street2"],"arrayStringCol2": ["Japan",
+|"China", "India"],"arrayIntCol": 
[1,2,3,4],"arrayBigIntCol":[7,6,8000],
+|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231],
+|"arrayBooleanCol": [false, false, false]} """.stripMargin
+
+
+val mySchema =
+  """ {
+|  "name": "address",
+|  "type": "record",
+|  "fields": [
+|  {
+|  "name": "stringCol",
+|  "type": "string"