[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-12 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r469704722



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java
##
@@ -23,6 +23,7 @@
 import java.util.BitSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Stack;

Review comment:
   You have missed to handle adaptive **delta** flows
   AdaptiveDeltaIntegralCodec
   AdaptiveDeltaFloatingCodec
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


ajantha-bhat commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r469701326



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##
@@ -281,7 +287,11 @@ private CarbonTableCacheModel 
getValidCacheBySchemaTableName(SchemaTableName sch
   createInputFormat(jobConf, carbonTable.getAbsoluteTableIdentifier(),
   new IndexFilter(carbonTable, filters, true), filteredPartitions);
   Job job = Job.getInstance(jobConf);
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.IS_QUERY_FROM_PRESTO, "true");

Review comment:
   @kunal642 , current carbon and presto integration is only in the query. 
Load or insert is not supported. 
   So setting only in query flow should be enough I guess





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-12 Thread GitBox


kunal642 commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r469700368



##
File path: 
integration/spark/src/test/scala/org/apache/indexserver/IndexServerTest.scala
##
@@ -0,0 +1,56 @@
+package org.apache.indexserver

Review comment:
   Please add header





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


kunal642 commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r469700182



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##
@@ -281,7 +287,11 @@ private CarbonTableCacheModel 
getValidCacheBySchemaTableName(SchemaTableName sch
   createInputFormat(jobConf, carbonTable.getAbsoluteTableIdentifier(),
   new IndexFilter(carbonTable, filters, true), filteredPartitions);
   Job job = Job.getInstance(jobConf);
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.IS_QUERY_FROM_PRESTO, "true");

Review comment:
   Can we move this initialization to a more common class..currently this 
will be set only in case of read





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh commented on pull request #3877: [CARBONDATA-3889] Cleanup duplicated code in carbondata-spark module

2020-08-12 Thread GitBox


kevinjmh commented on pull request #3877:
URL: https://github.com/apache/carbondata/pull/3877#issuecomment-673255380


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-673095043


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3706/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-673087338


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1967/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-673085997


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1966/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-673084876


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3705/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-672997883


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3704/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-672997258


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1965/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-672936291


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1964/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-672935234


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3703/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


Indhumathi27 commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r469307444



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##
@@ -55,14 +57,27 @@ class DistributedIndexJob extends AbstractIndexJob {
 val (response, time) = logTime {
   try {
 val spark = SparkSQLUtil.getSparkSession
-indexFormat.setTaskGroupId(SparkSQLUtil.getTaskGroupId(spark))
-indexFormat.setTaskGroupDesc(SparkSQLUtil.getTaskGroupDesc(spark))
+// In case of presto with index server flow, sparksession will be null
+if (null != spark) {

Review comment:
   handled





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-12 Thread GitBox


kunal642 commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r469287811



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexJobs.scala
##
@@ -55,14 +57,27 @@ class DistributedIndexJob extends AbstractIndexJob {
 val (response, time) = logTime {
   try {
 val spark = SparkSQLUtil.getSparkSession
-indexFormat.setTaskGroupId(SparkSQLUtil.getTaskGroupId(spark))
-indexFormat.setTaskGroupDesc(SparkSQLUtil.getTaskGroupDesc(spark))
+// In case of presto with index server flow, sparksession will be null
+if (null != spark) {

Review comment:
   is it possible to add a configuration to CarbonProperties that can tell 
us whether its a presto flow or not. Property should not be exposed to user. 
just for internal purpose.
   
   This null check looks dirty.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#issuecomment-672878188


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3702/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#issuecomment-672876442


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1963/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-12 Thread GitBox


MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r469242044



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -316,4 +324,17 @@ object IndexServer extends ServerInterface {
   Array(new Service("security.indexserver.protocol.acl", 
classOf[ServerInterface]))
 }
   }
+
+  def startAgingFolders(): Unit = {
+val runnable = new Runnable() {
+  def run() {
+val age = System.currentTimeMillis() - agePeriod.toLong
+CarbonUtil.agingTempFolderForIndexServer(age)
+LOGGER.info(s"Complete age temp folder 
${CarbonUtil.getIndexServerTempPath}")
+  }
+}
+val ags: ScheduledExecutorService = 
Executors.newSingleThreadScheduledExecutor
+ags.scheduleAtFixedRate(runnable, 1000, 360, TimeUnit.MICROSECONDS)

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-12 Thread GitBox


MarvinLitt commented on a change in pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#discussion_r469241896



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
##
@@ -485,4 +486,30 @@ public boolean equals(Object o) {
   public int hashCode() {
 return Objects.hash(file.getAbsolutePath());
   }
+
+  @Override
+  public List listDirs() throws IOException {
+if (!file.isDirectory()) {
+  return new ArrayList();
+}
+Collection fileCollection = FileUtils.listFilesAndDirs(file,
+DirectoryFileFilter.DIRECTORY, null);
+if (fileCollection.isEmpty()) {
+  return new ArrayList();
+}
+List carbonFiles = new ArrayList();
+for (File file : fileCollection) {
+  if (file.isDirectory()) {
+File[] files = file.listFiles();

Review comment:
   okay,done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-672853296


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3701/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-12 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-672844127


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1962/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-12 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r469161474



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements 
PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField 
field) {
+super(batchSize, dataType);
+this.batchSize = batchSize;
+this.type = getArrayOfType(field, dataType);
+ArrayList childrenList= new ArrayList<>();
+
childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, 
field.getDataType(), field));
+setChildrenVector(childrenList);
+this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+return index;
+  }
+
+  public void setIndex(int index) {
+this.index = index;
+  }
+
+  public String getDataTypeName() {
+return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+if (dataType == DataTypes.STRING) {
+  return new ArrayType(VarcharType.VARCHAR);
+} else if (dataType == DataTypes.BYTE) {
+  return new ArrayType(TinyintType.TINYINT);
+} else if (dataType == DataTypes.SHORT) {
+  return new ArrayType(SmallintType.SMALLINT);
+} else if (dataType == DataTypes.INT) {

Review comment:
   Also VARCHAR is missing and rebase PR to handle binary also





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-12 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-672783133


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-12 Thread GitBox


akashrn5 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r469029220



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -154,13 +153,44 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
 sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
-intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
-}
+checkAnswer(sql("select count(*) from load32000chardata"), Seq(Row(3)))
+// String whilch length greater than 32000 will be considered as bad 
record and will be inserted as null in table

Review comment:
   ```suggestion
   // String which length greater than 32000 will be considered as bad 
record and will be inserted as null in table
   ```

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2456,4 +2456,10 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  public static final String STRING_LENGTH_EXCEEDED_MESSAGE =
+  "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT +
+  " bytes. Please consider long string data type.";

Review comment:
   ```suggestion
 " characters. Please consider long string data type.";
   ```

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -20,17 +20,16 @@ package 
org.apache.carbondata.integration.spark.testsuite.dataload
 import java.math.BigDecimal
 
 import scala.collection.mutable.ArrayBuffer
-

Review comment:
   remove unnecessary changes if not required

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -20,17 +20,16 @@ package 
org.apache.carbondata.integration.spark.testsuite.dataload
 import java.math.BigDecimal
 
 import scala.collection.mutable.ArrayBuffer
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterEach
-

Review comment:
   same as above

##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -330,21 +331,35 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
 }
   }
 
+  private byte[] getNullForBytes(byte[] value) {

Review comment:
   i think this is not required, can reuse `updateNullValue`, please check

##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -330,21 +331,35 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
 }
   }
 
+  private byte[] getNullForBytes(byte[] value) {
+String badRecordAction = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION);
+if 
(badRecordAction.equalsIgnoreCase(CarbonCommonConstants.FORCE_BAD_RECORD_ACTION))
 {
+  if (this.carbonDimension.getDataType() == DataTypes.STRING) {
+return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+  } else {
+return CarbonCommonConstants.EMPTY_BYTE_ARRAY;
+  }
+}
+return value;
+  }
+
   private void checkAndWriteByteArray(Object input, DataOutputStream 
dataOutputStream,
   BadRecordLogHolder logHolder, Boolean isWithoutConverter, String 
parsedValue, byte[] value)
   throws IOException {
 if (isWithoutConverter) {
   if (this.carbonDimension.getDataType() == DataTypes.STRING && input 
instanceof String
   && ((String)input).length() > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " bytes");
+
logHolder.setReason(String.format(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE,
+input.toString(), this.carbonDimension.getColName()));
+value = getNullForBytes(value);
   }
   updateValueToByteStream(dataOutputStream, value);
 } else {
   if (this.carbonDimension.getDataType() == DataTypes.STRING
   && value.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot 

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-12 Thread GitBox


ajantha-bhat commented on a change in pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#discussion_r469026786



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto.readers;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.prestosql.spi.type.*;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.block.BlockBuilder;
+
+import org.apache.carbondata.presto.CarbonVectorBatch;
+
+/**
+ * Class to read the Array Stream
+ */
+
+public class ArrayStreamReader extends CarbonColumnVectorImpl implements 
PrestoVectorBlockBuilder {
+
+  protected int batchSize;
+
+  protected Type type;
+  protected BlockBuilder builder;
+  Block childBlock = null;
+  private int index = 0;
+
+  public ArrayStreamReader(int batchSize, DataType dataType, StructField 
field) {
+super(batchSize, dataType);
+this.batchSize = batchSize;
+this.type = getArrayOfType(field, dataType);
+ArrayList childrenList= new ArrayList<>();
+
childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, 
field.getDataType(), field));
+setChildrenVector(childrenList);
+this.builder = type.createBlockBuilder(null, batchSize);
+  }
+
+  public int getIndex() {
+return index;
+  }
+
+  public void setIndex(int index) {
+this.index = index;
+  }
+
+  public String getDataTypeName() {
+return "ARRAY";
+  }
+
+  Type getArrayOfType(StructField field, DataType dataType) {
+if (dataType == DataTypes.STRING) {
+  return new ArrayType(VarcharType.VARCHAR);
+} else if (dataType == DataTypes.BYTE) {
+  return new ArrayType(TinyintType.TINYINT);
+} else if (dataType == DataTypes.SHORT) {
+  return new ArrayType(SmallintType.SMALLINT);
+} else if (dataType == DataTypes.INT) {

Review comment:
   decimal datatype handling is also missing 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org