[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857547 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "string" +
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857370 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "string" +
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
akkio-97 commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466857297 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "string" +
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat edited a comment on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-669844507 @akkio-97 : update limitations and TODO clearly a. with local dictionary arrays cannot be read now b. arrays with other complex type is not supported yet c. currently, array is row by row filling, not really vector processing. can use offset vector-like ORC https://github.com/prestosql/presto/blob/master/presto-orc/src/main/java/io/prestosql/orc/reader/ListColumnReader.java I also feel arrayStreamRader and some interface need to cleaned up [I will do it with struct support] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670327844 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3649/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-670327499 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1910/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
QiangCai commented on a change in pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#discussion_r466786922 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoHadoopFsRelationCommand.scala ## @@ -104,11 +104,13 @@ case class CarbonInsertIntoHadoopFsRelationCommand( val dynamicPartitionOverwrite = enableDynamicOverwrite && mode == SaveMode.Overwrite && staticPartitions.size < partitionColumns.length -val committer = FileCommitProtocol.instantiate( - sparkSession.sessionState.conf.fileCommitProtocolClass, - jobId = java.util.UUID.randomUUID().toString, - outputPath = outputPath.toString, - dynamicPartitionOverwrite = dynamicPartitionOverwrite) +val committer = fileFormat match { Review comment: better to check whether it is carbondata or carbonfile table in DDLStrategy. if the table is carbonfile table, it should not go to CarbonInsertIntoHadoopFsRelationCommand flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3874: [CARBONDATA-3931]Fix Secondary index with index column as DateType giving wrong results
QiangCai commented on a change in pull request #3874: URL: https://github.com/apache/carbondata/pull/3874#discussion_r466781511 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/query/SecondaryIndexQueryResultProcessor.java ## @@ -249,10 +249,17 @@ private void processResult(List> detailQueryResultItera private Object[] prepareRowObjectForSorting(Object[] row) { ByteArrayWrapper wrapper = (ByteArrayWrapper) row[0]; // ByteBuffer[] noDictionaryBuffer = new ByteBuffer[noDictionaryCount]; - List dimensions = segmentProperties.getDimensions(); Object[] preparedRow = new Object[dimensions.size() + measureCount]; +// get dictionary values for date type +byte[] dictionaryKey = wrapper.getDictionaryKey(); +int[] keyArray = ByteUtil.convertBytesToIntArray(dictionaryKey); +Object[] dictionaryValues = new Object[dimensionColumnCount + measureCount]; +for (int i = 0; i < keyArray.length; i++) { + dictionaryValues[i] = keyArray[i]; Review comment: why do this copy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3881: [HOTFIX] NPE While Data Loading
QiangCai commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-670270759 please create an issue in JIRA to describe the issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3881: [HOTFIX] NPE While Data Loading
QiangCai commented on a change in pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#discussion_r466769303 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ## @@ -207,7 +207,7 @@ case class CarbonRelation( null != validSeg.getLoadMetadataDetails.getIndexSize) { size = size + validSeg.getLoadMetadataDetails.getDataSize.toLong + validSeg.getLoadMetadataDetails.getIndexSize.toLong - } else { + } else if (!carbonTable.isHivePartitionTable) { Review comment: better to find the root cause and fix it. LoadMetadataDetail should have data/index size beside of old store. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670150976 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3647/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670145319 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1908/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670143028 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3646/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-670095485 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1907/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-670082507 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3643/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-670079892 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1904/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support
CarbonDataQA1 commented on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-670072418 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3645/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support
CarbonDataQA1 commented on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-670071253 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1906/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3877: [CARBONDATA-3889] Cleanup duplicated code in carbondata-spark module
CarbonDataQA1 commented on pull request #3877: URL: https://github.com/apache/carbondata/pull/3877#issuecomment-670062362 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3641/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
ajantha-bhat commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670056335 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670055404 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1901/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3877: [CARBONDATA-3889] Cleanup duplicated code in carbondata-spark module
CarbonDataQA1 commented on pull request #3877: URL: https://github.com/apache/carbondata/pull/3877#issuecomment-670053317 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1902/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-670051650 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3640/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [WIP] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-670005097 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3638/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3943) Handling the addition of geo column to hive at the time of table creation
SHREELEKHYA GAMPA created CARBONDATA-3943: - Summary: Handling the addition of geo column to hive at the time of table creation Key: CARBONDATA-3943 URL: https://issues.apache.org/jira/browse/CARBONDATA-3943 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Handling the addition of geo column to hive at the time of table creation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3943) Handling the addition of geo column to hive at the time of table creation
[ https://issues.apache.org/jira/browse/CARBONDATA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3943: -- Priority: Minor (was: Major) > Handling the addition of geo column to hive at the time of table creation > -- > > Key: CARBONDATA-3943 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3943 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > > Handling the addition of geo column to hive at the time of table creation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [WIP] Handling the addition of geo column to hive at the time of table creation.
CarbonDataQA1 commented on pull request #3879: URL: https://github.com/apache/carbondata/pull/3879#issuecomment-669984068 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1899/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-669968186 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1898/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [WIP] Support Presto with IndexSserver
CarbonDataQA1 commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-669965675 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3637/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-669964959 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1897/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-669960797 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3636/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
ajantha-bhat commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-669948010 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-669945651 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3635/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene
dependabot[bot] commented on pull request #3456: URL: https://github.com/apache/carbondata/pull/3456#issuecomment-669932089 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk
dependabot[bot] commented on pull request #3447: URL: https://github.com/apache/carbondata/pull/3447#issuecomment-669932215 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466417493 ## File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ArrayStreamReader.java ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.readers; + +import java.util.ArrayList; +import java.util.List; + +import io.prestosql.spi.type.*; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; + +import io.prestosql.spi.block.Block; +import io.prestosql.spi.block.BlockBuilder; + +import org.apache.carbondata.presto.CarbonVectorBatch; + +/** + * Class to read the Array Stream + */ + +public class ArrayStreamReader extends CarbonColumnVectorImpl implements PrestoVectorBlockBuilder { + + protected int batchSize; + + protected Type type; + protected BlockBuilder builder; + Block childBlock = null; + private int index = 0; + + public ArrayStreamReader(int batchSize, DataType dataType, StructField field) { +super(batchSize, dataType); +this.batchSize = batchSize; +this.type = getArrayOfType(field, dataType); +ArrayList childrenList= new ArrayList<>(); + childrenList.add(CarbonVectorBatch.createDirectStreamReader(this.batchSize, field.getDataType(), field)); +setChildrenVector(childrenList); +this.builder = type.createBlockBuilder(null, batchSize); + } + + public int getIndex() { +return index; + } + + public void setIndex(int index) { +this.index = index; + } + + public String getDataTypeName() { +return "ARRAY"; + } + + Type getArrayOfType(StructField field, DataType dataType) { +if (dataType == DataTypes.STRING) { + return new ArrayType(VarcharType.VARCHAR); +} else if (dataType == DataTypes.BYTE) { + return new ArrayType(TinyintType.TINYINT); +} else if (dataType == DataTypes.SHORT) { + return new ArrayType(SmallintType.SMALLINT); +} else if (dataType == DataTypes.INT) { + return new ArrayType(IntegerType.INTEGER); +} else if (dataType == DataTypes.LONG) { + return new ArrayType(BigintType.BIGINT); +} else if (dataType == DataTypes.DOUBLE) { + return new ArrayType(DoubleType.DOUBLE); +} else if (dataType == DataTypes.FLOAT) { + return new ArrayType(RealType.REAL); +} else if (dataType == DataTypes.BOOLEAN) { + return new ArrayType(BooleanType.BOOLEAN); +} else if (dataType == DataTypes.TIMESTAMP) { + return new ArrayType(TimestampType.TIMESTAMP); +} else if (DataTypes.isArrayType(dataType)) { + StructField childField = field.getChildren().get(0); + return new ArrayType(getArrayOfType(childField, childField.getDataType())); +} else { + throw new UnsupportedOperationException("Unsupported type: " + dataType); +} + } + + @Override + public Block buildBlock() { +return builder.build(); + } + + public boolean isComplex() { +return true; + } + + @Override + public void setBatchSize(int batchSize) { +this.batchSize = batchSize; + } + + @Override + public void putObject(int rowId, Object value) { +if (value == null) { + putNull(rowId); +} else { + getChildrenVector().get(0).putObject(rowId, value); +} + } + + public void putArrayObject() { +if (DataTypes.isArrayType(this.getType())) { + childBlock = ((ArrayStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.STRING) { + childBlock = ((SliceStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.INT) { + childBlock = ((IntegerStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.LONG) { + childBlock = ((LongStreamReader) getChildrenVector().get(0)).buildBlock(); +} else if (this.getType() == DataTypes.DOUBLE) { + childBlock = ((DoubleStreamReader) getChildrenVector().get(0)).buildBloc
[GitHub] [carbondata] asfgit closed pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
asfgit closed pull request #3872: URL: https://github.com/apache/carbondata/pull/3872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-669915582 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1896/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [HOTFIX] NPE While Data Loading
CarbonDataQA1 commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-66990 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3634/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [HOTFIX] NPE While Data Loading
CarbonDataQA1 commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-669907190 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1895/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
CarbonDataQA1 commented on pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#issuecomment-669894078 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3631/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 opened a new pull request #3885: [WIP] Support Presto with IndexSserver
Indhumathi27 opened a new pull request #3885: URL: https://github.com/apache/carbondata/pull/3885 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3884: [CARBONDATA-3942] Fix type cast when loading data into partitioned table
CarbonDataQA1 commented on pull request #3884: URL: https://github.com/apache/carbondata/pull/3884#issuecomment-669887552 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3633/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3884: [CARBONDATA-3942] Fix type cast when loading data into partitioned table
CarbonDataQA1 commented on pull request #3884: URL: https://github.com/apache/carbondata/pull/3884#issuecomment-669886202 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1894/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
CarbonDataQA1 commented on pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#issuecomment-669885501 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3632/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
CarbonDataQA1 commented on pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#issuecomment-669880833 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1893/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-669860500 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3630/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
CarbonDataQA1 commented on pull request #3883: URL: https://github.com/apache/carbondata/pull/3883#issuecomment-669858937 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1892/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-669856755 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1891/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#issuecomment-669844507 @akkio-97 : update limitations and TODO clearly a. with local dictionary arrays cannot be read now b. arrays with other complex type is not supported yet I also feel arrayStreamRader and some interface need to cleaned up [I will do it with struct support] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466313217 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "strin
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466313146 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "strin
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466312957 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "strin
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466312027 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "strin
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466311639 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/GenerateFiles.scala ## @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto.integrationtest + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.junit.Assert + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.block.TableBlockInfo +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk +import org.apache.carbondata.core.datastore.chunk.reader.CarbonDataReaderFactory +import org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.DimensionChunkReaderV3 +import org.apache.carbondata.core.datastore.compression.CompressorFactory +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter} +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory +import org.apache.carbondata.core.metadata.ColumnarFormatVersion +import org.apache.carbondata.core.util.{CarbonMetadataUtil, DataFileFooterConverterV3} +import org.apache.carbondata.sdk.file.CarbonWriter + +class GenerateFiles { + + def singleLevelArrayFile() = { +val json1: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true,"arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.111,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json2: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[1.1,2.2,4.45,3.3], +|"arrayBooleanCol": [true, true, true]} """.stripMargin +val json3: String = + """ {"stringCol": "Rio","intCol": 16,"doubleCol": 12.5,"realCol": 14.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2","Street3"], +|"arrayStringCol2": ["China", "Brazil", "Paris", "France"],"arrayIntCol": [1,2,3,4,5], + |"arrayBigIntCol":[7,6,8000,91],"arrayRealCol":[1.1,2.2,3.3,4.45], +|"arrayDoubleCol":[1.1,2.2,4.45,5.5,3.3], "arrayBooleanCol": [true, false, true]} """ +.stripMargin +val json4: String = + """ {"stringCol": "bob","intCol": 14,"doubleCol": 10.5,"realCol": 12.7, +|"boolCol": true, "arrayStringCol1":["Street1"],"arrayStringCol2": ["India", "Egypt"], +|"arrayIntCol": [1,2,3],"arrayBigIntCol":[7,6],"arrayRealCol":[1.1,2.2], +|"arrayDoubleCol":[1.1,2.2,3.3], "arrayBooleanCol": [true, false, true]} """.stripMargin +val json5: String = + """ {"stringCol": "Alex","intCol": 15,"doubleCol": 11.5,"realCol": 13.7, +|"boolCol": true, "arrayStringCol1": ["Street1", "Street2"],"arrayStringCol2": ["Japan", +|"China", "India"],"arrayIntCol": [1,2,3,4],"arrayBigIntCol":[7,6,8000], +|"arrayRealCol":[1.1,2.2,3.3],"arrayDoubleCol":[4,1,21.222,15.231], +|"arrayBooleanCol": [false, false, false]} """.stripMargin + + +val mySchema = + """ { +| "name": "address", +| "type": "record", +| "fields": [ +| { +| "name": "stringCol", +| "type": "strin
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466310201 ## File path: integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java ## @@ -102,6 +89,12 @@ public static CarbonColumnVectorImpl createDirectStreamReader(int batchSize, Dat } else { return null; } +} else if (DataTypes.isArrayType(field.getDataType())) { + if (field.getChildren().size() > 1) { Review comment: remove this assert, array can never have more than one child This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3773: [CARBONDATA-3830]Presto array columns read support
ajantha-bhat commented on a change in pull request #3773: URL: https://github.com/apache/carbondata/pull/3773#discussion_r466309537 ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java ## @@ -102,6 +126,58 @@ public CarbonColumnVectorImpl(int batchSize, DataType dataType) { } + @Override + public CarbonColumnVector getColumnVector() { +return null; + } + + @Override + public List getChildrenVector() { +return childrenVector; + } + + @Override + public void putArrayObject() { +return; + } + + public void setChildrenVector(ArrayList childrenVector) { +this.childrenVector = childrenVector; + } + + public ArrayList getChildrenElements() { +return childrenElements; + } + + public void setChildrenElements(ArrayList childrenElements) { +this.childrenElements = childrenElements; + } + + public ArrayList getChildrenOffset() { +return childrenOffset; + } + + public void setChildrenOffset(ArrayList childrenOffset) { +this.childrenOffset = childrenOffset; + } + + public void setChildrenElementsAndOffset(byte[] childPageData) { +ByteBuffer childInfoBuffer = ByteBuffer.wrap(childPageData); +ArrayList childElements = new ArrayList<>(); +ArrayList childOffset = new ArrayList<>(); Review comment: offset is not required, even for struct type. so, please remove it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3881: [HOTFIX] NPE While Data Loading
marchpure commented on a change in pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#discussion_r466303536 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ## @@ -207,7 +207,7 @@ case class CarbonRelation( null != validSeg.getLoadMetadataDetails.getIndexSize) { size = size + validSeg.getLoadMetadataDetails.getDataSize.toLong + validSeg.getLoadMetadataDetails.getIndexSize.toLong - } else { + } else if (!carbonTable.isHivePartitionTable) { Review comment: Here, it aims collect the datasize of segment path. but the format of segment path generated is "Fart/Part0/Segment_0". For partition table. will throw out FileNotFound exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-669832416 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1888/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3881: [HOTFIX] NPE While Data Loading
QiangCai commented on a change in pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#discussion_r466295972 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ## @@ -207,7 +207,7 @@ case class CarbonRelation( null != validSeg.getLoadMetadataDetails.getIndexSize) { size = size + validSeg.getLoadMetadataDetails.getDataSize.toLong + validSeg.getLoadMetadataDetails.getIndexSize.toLong - } else { + } else if (!carbonTable.isHivePartitionTable) { Review comment: why add this check? ## File path: core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java ## @@ -87,7 +87,9 @@ public TableStatusReadCommittedScope(AbsoluteTableIdentifier identifier, SegmentFileStore fileStore = new SegmentFileStore(identifier.getTablePath(), segment.getSegmentFileName()); indexFiles = fileStore.getIndexOrMergeFiles(); - segment.setSegmentMetaDataInfo(fileStore.getSegmentFile().getSegmentMetaDataInfo()); + if (fileStore != null && fileStore.getSegmentFile() != null) { Review comment: no need to check "fileStore != null" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-669829422 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
CarbonDataQA1 commented on pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#issuecomment-669828122 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3627/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3881: [HOTFIX] NPE While Data Loading
CarbonDataQA1 commented on pull request #3881: URL: https://github.com/apache/carbondata/pull/3881#issuecomment-66982 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
asfgit closed pull request #3880: URL: https://github.com/apache/carbondata/pull/3880 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3879) Filtering Segmets Optimazation
[ https://issues.apache.org/jira/browse/CARBONDATA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3879. -- Fix Version/s: (was: 2.0.2) 2.1.0 Resolution: Fixed > Filtering Segmets Optimazation > -- > > Key: CARBONDATA-3879 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3879 > Project: CarbonData > Issue Type: Improvement > Components: data-query >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > During filter segments flow, there are a lot of LIST.CONTAINS, which has > heavy time overhead when there are tens of thousands segments. > For example, if there are 5 segments. it will trigger LIST.CONTAINS for > each segment, the LIST also has about 5 elements. so the time complexity > will be O(5 * 5 ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-669820094 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3625/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
ajantha-bhat commented on pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#issuecomment-669819275 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
ajantha-bhat commented on a change in pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#discussion_r466277616 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -389,14 +389,16 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon public void updateLoadMetaDataDetailsToSegments(List validSegments, List prunedSplits) { +Map validSegmentsMap = validSegments.stream() Review comment: oh, I got what you mean, we still need to get the element from valid segment to read its LoadMetadataDetails. So, SET can help only for contains, but not for getting the valid segment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
marchpure commented on a change in pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#discussion_r466275337 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -389,14 +389,16 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon public void updateLoadMetaDataDetailsToSegments(List validSegments, List prunedSplits) { +Map validSegmentsMap = validSegments.stream() Review comment: Yeah. But if use SET. it's hard to read a element with specified segmentno. In this code. we need to read a segment with specified segmentno from validSegments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
ajantha-bhat commented on a change in pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#discussion_r466270144 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -389,14 +389,16 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon public void updateLoadMetaDataDetailsToSegments(List validSegments, List prunedSplits) { +Map validSegmentsMap = validSegments.stream() Review comment: if you see the `equals` implementation of `Segment.java`, it is based on only segment number comparison. so, I think you can still use `SET` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
marchpure commented on a change in pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#discussion_r466265360 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -389,14 +389,16 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon public void updateLoadMetaDataDetailsToSegments(List validSegments, List prunedSplits) { +Map validSegmentsMap = validSegments.stream() Review comment: Appreciate for that good suggestion. But here we can only use Map instead of Set. Reason: The The pseudo-code for this function is shown as below. // **0. segments.hashcode is segmentno, to when we compare 2 segments, only segmentno will be compared** if (validSegments.contains(segmentInSplit)) { **1. fetch the segment from validSegments** Segment segmentInValidSegment <- fetch the segment from validSegments 2. segmentInSplit.setLoadMetadataDetails(segmentInValidSegment.getLoadMetadataDetails); } For SET. it's hard to fetch the segment from validSegments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-669812497 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1886/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
ajantha-bhat commented on a change in pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#discussion_r466254229 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" try { val builder = CarbonWriter.builder() val writer = builder.outputPath(writerPath) .uniqueIdentifier(System.currentTimeMillis()).withBlockSize(2).sortBy(sortColumns) .withCsvInput(new Schema(fields)).writtenBy("TestNonTransactionalCarbonTable").build() var i = 0 + val bis = new BufferedInputStream(new FileInputStream(imagePath)) + var hexValue: Array[Char] = null + val originBinary = new Array[Byte](bis.available) + while (bis.read(originBinary) != -1) { +hexValue = Hex.encodeHex(originBinary) + } + bis.close() + val binaryValue = String.valueOf(hexValue) + Review comment: ok ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" Review comment: ok, used root path This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
ajantha-bhat commented on a change in pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#discussion_r466252790 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" Review comment: > another question: why presto module need scala? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
ajantha-bhat commented on a change in pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#discussion_r466252790 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" Review comment: > another question: why presto module need scala? we can add spark as test dependency and create store from spark and query from presto ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" Review comment: > another question: why presto module need scala? we can add spark as test dependency and create store from spark and query from presto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3877: [CARBONDATA-3889] Cleanup duplicated code in carbondata-spark module
QiangCai commented on pull request #3877: URL: https://github.com/apache/carbondata/pull/3877#issuecomment-669804784 @kevinjmh @ajantha-bhat please help to review this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
QiangCai commented on a change in pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#discussion_r466249308 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" try { val builder = CarbonWriter.builder() val writer = builder.outputPath(writerPath) .uniqueIdentifier(System.currentTimeMillis()).withBlockSize(2).sortBy(sortColumns) .withCsvInput(new Schema(fields)).writtenBy("TestNonTransactionalCarbonTable").build() var i = 0 + val bis = new BufferedInputStream(new FileInputStream(imagePath)) + var hexValue: Array[Char] = null + val originBinary = new Array[Byte](bis.available) + while (bis.read(originBinary) != -1) { +hexValue = Hex.encodeHex(originBinary) + } + bis.close() + val binaryValue = String.valueOf(hexValue) + Review comment: keep only one blank line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3882: [CARBONDATA-3941] Support binary data type reading from presto
QiangCai commented on a change in pull request #3882: URL: https://github.com/apache/carbondata/pull/3882#discussion_r466249621 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -176,24 +188,35 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf def buildTestDataOtherDataType(rows: Int, sortColumns: Array[String]): Any = { val fields: Array[Field] = new Array[Field](5) // same column name, but name as boolean type -fields(0) = new Field("name", DataTypes.BOOLEAN) +fields(0) = new Field("name", DataTypes.VARCHAR) fields(1) = new Field("age", DataTypes.INT) -fields(2) = new Field("id", DataTypes.BYTE) +fields(2) = new Field("id", DataTypes.BINARY) fields(3) = new Field("height", DataTypes.DOUBLE) fields(4) = new Field("salary", DataTypes.FLOAT) +val imagePath = "../../sdk/sdk/src/test/resources/image/carbondatalogo.jpg" Review comment: better to base on rootPath val imagePath = "$rootPath/sdk/sdk/src/test/resources/image/carbondatalogo.jpg" another question: why presto module need scala? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
ajantha-bhat commented on pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#issuecomment-669801705 LGTM. can merge once build is passed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] IceMimosa commented on pull request #3884: [CARBONDATA-3942] Fix type cast when loading data into partitioned table
IceMimosa commented on pull request #3884: URL: https://github.com/apache/carbondata/pull/3884#issuecomment-669798694 reset this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3880: [CARBONDATA-3879] Filtering Segmets Optimazation
ajantha-bhat commented on a change in pull request #3880: URL: https://github.com/apache/carbondata/pull/3880#discussion_r466243016 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -389,14 +389,16 @@ protected FileSplit makeSplit(String segmentId, String filePath, long start, lon public void updateLoadMetaDataDetailsToSegments(List validSegments, List prunedSplits) { +Map validSegmentsMap = validSegments.stream() Review comment: creating a map everytime for query is also overhead when the valid segments is in thousands, I suggest we can use `SET` for `validSegments` when it is formed originally instead of 'LIST' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3942) Fix type cast when loading data into partitioned table
[ https://issues.apache.org/jira/browse/CARBONDATA-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai updated CARBONDATA-3942: Summary: Fix type cast when loading data into partitioned table (was: Fix type cast when doing data load into partitioned table) > Fix type cast when loading data into partitioned table > -- > > Key: CARBONDATA-3942 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3942 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.1.0 >Reporter: ChenKai >Priority: Major > > Loading Int type data to carbondata double type, the value will be broken > like this: > +---++++ > |cnt |name|time| > +---++++ > |4.9E-323|a |2020| > |1.0E-322|b |2020| > +---++++ > original cnt is: 10, 20 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] IceMimosa opened a new pull request #3884: [CARBONDATA-3942] Fix type cast when loading data into partitioned table
IceMimosa opened a new pull request #3884: URL: https://github.com/apache/carbondata/pull/3884 ### Why is this PR needed? Loading Int type data to carbondata double type, the value will be broken like this: +---++ |cnt |time| +---++ |4.9E-323|2020| |1.0E-322|2020| +---++ original cnt value is: 10, 20 ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
QiangCai commented on a change in pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#discussion_r466236037 ## File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonVectorProxy.java ## @@ -0,0 +1,554 @@ +/* Review comment: @ajantha-bhat I try the following two ways in one commit, but it gets the same result. 1. git mv and modify new file 2. modify old file and git mv Finally, I raise two commits, you can review one by one in this PR>_< This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #3883: [CARBONDATA-3940] CommitTask fails due to Rename IOException during L…
marchpure opened a new pull request #3883: URL: https://github.com/apache/carbondata/pull/3883 …oading ### Why is this PR needed? During the load process, commitTask fails with high probability. The exceptionstack shows that it was throwed by HadoopMapReduceCommitProtocol, not CarbonSQLHadoopMapMapReduceCommitProtocol, implying that there is class init error during the initializing of "Committer". which should have been initialized as CarbonSQLHadoopMapMapReduceCommitProtocol, but was incorrectly initialized to HadoopMapReduceCommitProtocol. ### What changes were proposed in this PR? Init the committer to be CarbonSQLHadoopMapMapReduceCommitProtocol directly ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3942) Fix type cast when doing data load into partitioned table
ChenKai created CARBONDATA-3942: --- Summary: Fix type cast when doing data load into partitioned table Key: CARBONDATA-3942 URL: https://issues.apache.org/jira/browse/CARBONDATA-3942 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.1.0 Reporter: ChenKai Loading Int type data to carbondata double type, the value will be broken like this: +---++++ |cnt |name|time| +---++++ |4.9E-323|a |2020| |1.0E-322|b |2020| +---++++ original cnt is: 10, 20 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] QiangCai commented on a change in pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
QiangCai commented on a change in pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#discussion_r466218295 ## File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonVectorProxy.java ## @@ -0,0 +1,554 @@ +/* Review comment: CarbonVectorProxy.java have an indent issue, it led to many changes (about 400 lines). I will use "git mv" to try again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-669767240 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3628/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-669766599 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1889/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3941) Presto cannot query binary datatype store
Ajantha Bhat created CARBONDATA-3941: Summary: Presto cannot query binary datatype store Key: CARBONDATA-3941 URL: https://issues.apache.org/jira/browse/CARBONDATA-3941 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat Presto cannot query binary datatype store -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3930) MVExample is throwing DataLoadingException
[ https://issues.apache.org/jira/browse/CARBONDATA-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3930. -- Fix Version/s: 2.1.0 Resolution: Fixed > MVExample is throwing DataLoadingException > -- > > Key: CARBONDATA-3930 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3930 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: David Cai >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 40m > Remaining Estimate: 0h > > [Reproduce] > Run > examples/spark/src/main/scala/org/apache/carbondata/examples/MVExample.scala > in IDEA > [LOG] > Exception in thread "main" > org.apache.carbondata.processing.exception.DataLoadingException: The input > file does not exist: > /***/carbondata/integration/spark-common-test/src/test/resources/sample.csvException > in thread "main" > org.apache.carbondata.processing.exception.DataLoadingException: The input > file does not exist: > /home/david/Documents/code/carbondata/integration/spark-common-test/src/test/resources/sample.csv > at > org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:81) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at > org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:77) at > org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:97) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:148) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:145) > at > org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:104) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:141) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:145) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at > org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at > org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3265) at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3264) at > org.apache.spark.sql.Dataset.(Dataset.scala:190) at > org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at > org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at > org.apache.carbondata.examples.MVExample$.exampleBody(MVExample.scala:67) at > org.apache.carbondata.examples.MVExample$.main(MVExample.scala:37) at > org.apache.carbondata.examples.MVExample.main(MVExample.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3870: [CARBONDATA-3930] Fix DataLoadingException in MVExample
asfgit closed pull request #3870: URL: https://github.com/apache/carbondata/pull/3870 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3872: [CARBONDATA-3889] Enable java check style for all java modules
ajantha-bhat commented on a change in pull request #3872: URL: https://github.com/apache/carbondata/pull/3872#discussion_r466201420 ## File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonVectorProxy.java ## @@ -0,0 +1,554 @@ +/* Review comment: I can see other files are moved, but why this file shows as added instead of moved ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3870: [CARBONDATA-3930] Fix DataLoadingException in MVExample
ajantha-bhat commented on pull request #3870: URL: https://github.com/apache/carbondata/pull/3870#issuecomment-669755870 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3882: [WIP] Support binary data type reading from presto
ajantha-bhat opened a new pull request #3882: URL: https://github.com/apache/carbondata/pull/3882 ### Why is this PR needed? when binary store is queried from presto, presto currently give 0 rows. ### What changes were proposed in this PR? Presto can support binary (varBinary) data type reading by using the SliceStreamReader and it can put binary byte[] using putByteArray() method of SliceStreamReader ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org