[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655625 --- Diff: store/search/src/main/scala/org/apache/spark/rpc/Master.scala --- @@ -81,7 +81,7 @@ class Master(sparkConf: SparkConf) { do { try { LOG.info(s"starting registry-service on $hostAddress:$port") - val config = RpcEnvConfig( + val config = RpcUtil.getRpcEnvConfig( --- End diff -- After analyzing the #2372 these changes are not required,so reverted ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655227 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -247,6 +252,32 @@ object CarbonReflectionUtils { isFormatted } + + def getRowDataSourceScanExecObj(relation: LogicalRelation, --- End diff -- fixed ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655176 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -247,6 +252,32 @@ object CarbonReflectionUtils { isFormatted } + --- End diff -- Fixed ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655245 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala --- @@ -38,9 +39,17 @@ import org.apache.carbondata.common.logging.{LogService, LogServiceFactory} import org.apache.carbondata.core.features.TableOperation import org.apache.carbondata.core.util.CarbonProperties -/** - * Carbon strategies for ddl commands - */ + /** Carbon strategies for ddl commands --- End diff -- fixed ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655128 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala --- @@ -1787,20 +1839,23 @@ case class CarbonPreAggregateDataLoadingRules(sparkSession: SparkSession) // named expression list otherwise update the list and add it to set if (!validExpressionsMap.contains(AggExpToColumnMappingModel(sumExp))) { namedExpressionList += -Alias(expressions.head, name + "_ sum")(NamedExpression.newExprId, +CarbonCompilerUtil.createAliasRef(expressions.head, + name + "_ sum", + NamedExpression.newExprId, alias.qualifier, Some(alias.metadata), - alias.isGenerated) + Some(alias)) validExpressionsMap += AggExpToColumnMappingModel(sumExp) } // check with same expression already count is present then do not add to // named expression list otherwise update the list and add it to set if (!validExpressionsMap.contains(AggExpToColumnMappingModel(countExp))) { namedExpressionList += -Alias(expressions.last, name + "_ count")(NamedExpression.newExprId, - alias.qualifier, - Some(alias.metadata), - alias.isGenerated) + CarbonCompilerUtil.createAliasRef(expressions.last, name + "_ count", --- End diff -- Fixed,Changed the name from CarbonCompilerUtil to CarbonToSparkAdapater ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196655020 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/sql/commands/StoredAsCarbondataSuite.scala --- @@ -87,7 +87,7 @@ class StoredAsCarbondataSuite extends QueryTest with BeforeAndAfterEach { sql("CREATE TABLE carbon_table(key INT, value STRING) STORED AS ") } catch { case e: Exception => -assert(e.getMessage.contains("no viable alternative at input")) +assert(true) --- End diff -- Fixed,added or condition with message as per spark 2.3.0 ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196654906 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -140,6 +142,13 @@ object CarbonReflectionUtils { relation, expectedOutputAttributes, catalogTable)._1.asInstanceOf[LogicalRelation] +} else if (SPARK_VERSION.startsWith("2.3")) { --- End diff -- Fixed,added the Utility method for spark version comparison in SparkUtil.scala ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196654926 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala --- @@ -355,18 +362,19 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { } private def getDataSourceScan(relation: LogicalRelation, - output: Seq[Attribute], - partitions: Seq[PartitionSpec], - scanBuilder: (Seq[Attribute], Seq[Expression], Seq[Filter], -ArrayBuffer[AttributeReference], Seq[PartitionSpec]) => RDD[InternalRow], - candidatePredicates: Seq[Expression], - pushedFilters: Seq[Filter], - metadata: Map[String, String], - needDecoder: ArrayBuffer[AttributeReference], - updateRequestedColumns: Seq[Attribute]): DataSourceScanExec = { +output: Seq[Attribute], --- End diff -- fixed ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196654954 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/bigdecimal/TestBigDecimal.scala --- @@ -149,8 +149,9 @@ class TestBigDecimal extends QueryTest with BeforeAndAfterAll { } test("test sum*10 aggregation on big decimal column with high precision") { -checkAnswer(sql("select sum(salary)*10 from carbonBigDecimal_2"), - sql("select sum(salary)*10 from hiveBigDecimal")) +val carbonSeq = sql("select sum(salary)*10 from carbonBigDecimal_2").collect --- End diff -- fixed ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196654884 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala --- @@ -65,7 +66,7 @@ object CarbonReflectionUtils { className, tableIdentifier, tableAlias)._1.asInstanceOf[UnresolvedRelation] -} else if (SPARK_VERSION.startsWith("2.2")) { +} else if (SPARK_VERSION.startsWith("2.2") || SPARK_VERSION.startsWith("2.3")) { --- End diff -- Fixed,added the Utility method for spark version comparison in SparkUtil.scala ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196654276 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -71,7 +72,7 @@ public ColumnPageDecoder createDecoder(List encodings, List
[GitHub] carbondata issue #2380: [CARBONDATA-2509][CARBONDATA-2510][CARBONDATA-2511][...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2380 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5231/ ---
[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2379 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5342/ ---
[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2366#discussion_r196650804 --- Diff: integration/spark-common/pom.xml --- @@ -65,6 +65,11 @@ scalatest_${scala.binary.version} provided + + org.apache.zookeeper --- End diff -- Not intentional change i guess :) ---
[GitHub] carbondata issue #2382: [CARBONDATA-2513][32K] Support write long string fro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2382 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6397/ ---
[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2379 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6396/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5229/ ---
[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2379 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5230/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2384 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5341/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6395/ ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2374 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5340/ ---
[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2379 Rebased with the latest master branch. The second commit is to fix the review comments. ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2374 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5228/ ---
[jira] [Assigned] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion
[ https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2608: Assignee: Ajantha Bhat > SDK Support JSON data loading directly without AVRO conversion > -- > > Key: CARBONDATA-2608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2608 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: Ajantha Bhat >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Support JSON data loading directly into Carbon table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196637400 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/LVLongStringStatsCollector.java --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.statistics; + +/** + * This class is for the columns with varchar data type, + * a string type which can hold more than 32000 characters + */ +public class LVLongStringStatsCollector extends LVStringStatsCollector { + + public static LVLongStringStatsCollector newInstance() { +return new LVLongStringStatsCollector(); + } + + private LVLongStringStatsCollector() { + + } + + @Override + protected byte[] getActualValue(byte[] value) { +byte[] actualValue; +assert (value.length >= 4); +if (value.length == 4) { + assert (value[0] == 0 && value[1] == 0); + actualValue = new byte[0]; +} else { + // todo: what does this mean? + // int length = (value[0] << 8) + (value[1] & 0xff); --- End diff -- yeah, I find a more readable way to fix it. ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2374 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6394/ ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196636059 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -289,6 +289,12 @@ public void putDouble(int rowId, double value) { @Override public void putBytes(int rowId, byte[] bytes) { +// rowId * 4 represents the length of L in LV +if (bytes.length > (Integer.MAX_VALUE - totalLength - rowId * 4)) { --- End diff -- I come across a new idea: During parsing/converting, we can calculate #numberOfRowsPerPage * #currentCharacterLength, if it is larger than 2GB, dataload will fail. Notice that the #numberOfRowsPerPage is specified by user through configurations. If this is OK, I'll implementation in the future not this PR. ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196635615 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1601,6 +1602,8 @@ // As Short data type is used for storing the length of a column during data processing hence // the maximum characters that can be supported should be less than Short max value public static final int MAX_CHARS_PER_COLUMN_DEFAULT = 32000; + // todo: use infinity first, will switch later + public static final int MAX_CHARS_PER_COLUMN_INFINITY = -1; --- End diff -- As I mentioned in another PR, better not to introduce this limits. -1 means that the parser can parse infinity characters. ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196635418 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -279,7 +279,7 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { fields.zipWithIndex.foreach { case (field, index) => field.schemaOrdinal = index } -val (dims, msrs, noDictionaryDims, sortKeyDims) = extractDimAndMsrFields( +val (dims, msrs, noDictionaryDims, sortKeyDims, varcharColumns) = extractDimAndMsrFields( --- End diff -- Just like the other results of `extractDimAndMsrFields`, we validate and get the sort_column, dictionaries and the varcharColumns(longStringColumns). For the varcharColumns, we change their datatype from string to varchar later. ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196634986 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/BlockletInfo.java --- @@ -268,7 +268,7 @@ private DataChunk deserializeDataChunk(byte[] bytes) throws IOException { @Override public void readFields(DataInput input) throws IOException { dimensionOffset = input.readLong(); measureOffsets = input.readLong(); -short dimensionChunkOffsetsSize = input.readShort(); +int dimensionChunkOffsetsSize = input.readInt(); --- End diff -- OK~ ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196634976 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/blocklet/BlockletInfo.java --- @@ -205,7 +205,7 @@ public void setNumberOfPages(int numberOfPages) { output.writeLong(dimensionOffset); output.writeLong(measureOffsets); int dsize = dimensionChunkOffsets != null ? dimensionChunkOffsets.size() : 0; -output.writeShort(dsize); +output.writeInt(dsize); --- End diff -- OK~ ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196634573 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -64,7 +64,7 @@ public ColumnPageDecoder createDecoder(ColumnPageEncoderMeta meta) { return new DirectDecompressor(meta); } - private static class DirectCompressor extends ColumnPageEncoder { --- End diff -- Yeah, it is required because in the method `getEncodingList`, we want to use member `datatype` from the outside class. If it is static inner class, we cannot access that member. ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196634217 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -71,7 +72,7 @@ public ColumnPageDecoder createDecoder(List encodings, List
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196633906 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -103,6 +103,7 @@ private ColumnPageEncoder createEncoderForDimensionLegacy(TableSpec.DimensionSpe return new HighCardDictDimensionIndexCodec( dimensionSpec.isInSortColumns(), --- End diff -- emm, better not do this in this PR. All the parameters for *IndexCodec looks alike. Changing all of them will introduce unrelated changes. ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2384 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5339/ ---
[GitHub] carbondata pull request #2372: [CARBONDATA-2609] Change RPC implementation t...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2372#discussion_r196633489 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java --- @@ -80,7 +80,7 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context) } // It should use the exists tableBlockInfos if tableBlockInfos of queryModel is not empty // otherwise the prune is no use before this method -if (!queryModel.isFG()) { +if (queryModel.getTableBlockInfos().isEmpty()) { --- End diff -- If use (queryModel.getTableBlockInfos().isEmpty()), when the prune result is empty by FG in search mode, it will use the original TableBlockInfos and execute again, it mean the FG no use in this scenario. So we cannot change it to queryModel.getTableBlockInfos().isEmpty() ---
[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2379#discussion_r196633177 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RawColumnChunkUtil.java --- @@ -0,0 +1,65 @@ +/* --- End diff -- OK ---
[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2383 @kumarvishal09 If the string is too long, the user have to adjust the page size manually. We cannot do it dynamic for now. ---
[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2383#discussion_r196631555 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -371,8 +371,13 @@ private void setWritingConfiguration() throws CarbonDataWriterException { this.pageSize = Integer.parseInt(CarbonProperties.getInstance() .getProperty(CarbonCommonConstants.BLOCKLET_SIZE, CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL)); +// support less than 32000 rows in one page, because we support super long string, +// if it is long enough, a clomun page with 32000 rows will exceed 2GB if (version == ColumnarFormatVersion.V3) { - this.pageSize = CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; + this.pageSize = --- End diff -- In V3, it is 32000 by default. Here we use the min(32000, user_specified) ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196627999 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala --- @@ -403,6 +403,17 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser { partition = partitionSpec) } + /** + * The syntax of + * ALTER TABLE [dbName.]tableName ADD SEGMENT LOCATION 'path/to/data' + */ + protected lazy val addSegment: Parser[LogicalPlan] = +ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ +ADD ~ SEGMENT ~ LOCATION ~ stringLit <~ opt(";") ^^ { + case dbName ~ tableName ~ add ~ segment ~ location ~ filePath => --- End diff -- OK ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2384 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5338/ ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196627213 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddSegmentCommand.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.management + +import java.util.UUID + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.execution.command.AtomicRunnableCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.util.FileUtils + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.mutate.CarbonUpdateUtil +import org.apache.carbondata.core.statusmanager.{FileFormat, LoadMetadataDetails, SegmentStatus, SegmentStatusManager} +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.events.{OperationContext, OperationListenerBus} +import org.apache.carbondata.processing.loading.events.LoadEvents.LoadMetadataEvent +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} +import org.apache.carbondata.processing.util.CarbonLoaderUtil + +/** + * support `alter table tableName add segment location 'path'` command. + * It will create a segment and map the path of datafile to segment's storage + */ +case class CarbonAddSegmentCommand( +dbNameOp: Option[String], +tableName: String, +filePathFromUser: String, +var operationContext: OperationContext = new OperationContext) extends AtomicRunnableCommand { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + var carbonTable: CarbonTable = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val dbName = CarbonEnv.getDatabaseName(dbNameOp)(sparkSession) +carbonTable = { + val relation = CarbonEnv.getInstance(sparkSession).carbonMetastore +.lookupRelation(Option(dbName), tableName)(sparkSession).asInstanceOf[CarbonRelation] + if (relation == null) { +LOGGER.error(s"Add segment failed due to table $dbName.$tableName not found") +throw new NoSuchTableException(dbName, tableName) + } + relation.carbonTable +} + +if (carbonTable.isHivePartitionTable) { + LOGGER.error("Ignore hive partition table for now") +} + +operationContext.setProperty("isOverwrite", false) +if (CarbonUtil.hasAggregationDataMap(carbonTable)) { + val loadMetadataEvent = new LoadMetadataEvent(carbonTable, false) + OperationListenerBus.getInstance().fireEvent(loadMetadataEvent, operationContext) +} +Seq.empty + } + + // will just mapping external files to segment metadata + override def processData(sparkSession: SparkSession): Seq[Row] = { --- End diff -- In my opinion, creating the segment and updating the tablestatus both belong to `processData`. And in other command such as LoadData, these operation are in `processData` too. ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196626592 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala --- @@ -700,6 +700,13 @@ class TableNewProcessor(cm: TableModel) { cm.tableName)) tableInfo.setLastUpdatedTime(System.currentTimeMillis()) tableInfo.setFactTable(tableSchema) +val format = cm.tableProperties.get(CarbonCommonConstants.FORMAT) --- End diff -- OK ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196626156 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -426,6 +439,22 @@ class CarbonScanRDD[T: ClassTag]( CarbonTimeStatisticsFactory.createExecutorRecorder(model.getQueryId)) streamReader.setQueryModel(model) streamReader +case FileFormat.EXTERNAL => + assert(storageFormat.equals("csv"), --- End diff -- OK~ ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196625677 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -515,12 +574,73 @@ private CarbonInputSplit convertToCarbonInputSplit(ExtendedBlocklet blocklet) th return split; } + private List convertToInputSplit4ExternalFormat(JobContext jobContext, + ExtendedBlocklet extendedBlocklet) throws IOException { +List splits = new ArrayList(); +String factFilePath = extendedBlocklet.getFilePath(); +Path path = new Path(factFilePath); +FileSystem fs = FileFactory.getFileSystem(path); +FileStatus fileStatus = fs.getFileStatus(path); +long length = fileStatus.getLen(); +if (length != 0) { + BlockLocation[] blkLocations = fs.getFileBlockLocations(path, 0, length); + long blkSize = fileStatus.getBlockSize(); + long minSplitSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(jobContext)); + long maxSplitSize = getMaxSplitSize(jobContext); + long splitSize = computeSplitSize(blkSize, minSplitSize, maxSplitSize); + long bytesRemaining = fileStatus.getLen(); + while (((double) bytesRemaining) / splitSize > 1.1) { +int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); +splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, +length - bytesRemaining, +splitSize, blkLocations[blkIndex].getHosts(), +blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL)); +bytesRemaining -= splitSize; + } + if (bytesRemaining != 0) { +int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); +splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, +length - bytesRemaining, +bytesRemaining, blkLocations[blkIndex].getHosts(), +blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL)); + } +} else { + splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, 0, length, + new String[0], FileFormat.EXTERNAL)); +} +return splits; + } + @Override public RecordReader createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException { Configuration configuration = taskAttemptContext.getConfiguration(); QueryModel queryModel = createQueryModel(inputSplit, taskAttemptContext); CarbonReadSupport readSupport = getReadSupportClass(configuration); -return new CarbonRecordReader(queryModel, readSupport); +if (inputSplit instanceof CarbonMultiBlockSplit +&& ((CarbonMultiBlockSplit) inputSplit).getFileFormat() == FileFormat.EXTERNAL) { + return createRecordReaderForExternalFormat(queryModel, readSupport, + configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY)); +} else if (inputSplit instanceof CarbonInputSplit +&& ((CarbonInputSplit) inputSplit).getFileFormat() == FileFormat.EXTERNAL) { + return createRecordReaderForExternalFormat(queryModel, readSupport, + configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY)); +} else { + return new CarbonRecordReader(queryModel, readSupport); +} + } + + @Since("1.4.1") --- End diff -- OK ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196625604 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -174,9 +174,15 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO List result = new LinkedList(); // for each segment fetch blocks matching filter in Driver BTree -List dataBlocksOfSegment = -getDataBlocksOfSegment(job, carbonTable, filterResolver, matchedPartitions, -validSegments, partitionInfo, oldPartitionIdList); +List dataBlocksOfSegment; +if (carbonTable.getTableInfo().getFormat().equals("") --- End diff -- The default value of format is 'carbondata', so there is no need to handle empty. Will remove it ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196625258 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java --- @@ -0,0 +1,510 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop; + +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.Reader; +import java.io.UnsupportedEncodingException; +import java.math.BigDecimal; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.intf.RowImpl; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.statusmanager.FileFormatProperties; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat; + +import com.univocity.parsers.csv.CsvParser; +import com.univocity.parsers.csv.CsvParserSettings; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +/** + * scan csv file and filter on it + */ +@InterfaceStability.Evolving +@InterfaceAudience.Internal +public class CsvRecordReader extends AbstractRecordReader { --- End diff -- The procedure is alike, but the implementation is quite different. The most import parts are converting origin data to internal row and converting origin data to output row. StreamRecordReader, its origin source is ROW_V1 format while in CsvRecordReader, its origin source is CSV format. Besides, in StreamRecordReader there are more details, such as 'syncMark', 'rawRow', we do not need it in CSV. Maybe we can extract the common code in utils or create a new abstraction for ReadSupport or RecordReader. ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196624366 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java --- @@ -0,0 +1,510 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop; + +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.Reader; +import java.io.UnsupportedEncodingException; +import java.math.BigDecimal; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.intf.RowImpl; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.statusmanager.FileFormatProperties; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat; + +import com.univocity.parsers.csv.CsvParser; +import com.univocity.parsers.csv.CsvParserSettings; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +/** + * scan csv file and filter on it + */ +@InterfaceStability.Evolving +@InterfaceAudience.Internal +public class CsvRecordReader extends AbstractRecordReader { + private static final LogService LOGGER = LogServiceFactory.getLogService( + CsvRecordReader.class.getName()); + private static final int MAX_BATCH_SIZE = + CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; + // vector reader + private boolean isVectorReader; + private T columnarBatch; + + // metadata + private CarbonTable carbonTable; + private CarbonColumn[] carbonColumns; + // input + private QueryModel queryModel; + private CarbonReadSupport readSupport; + private FileSplit fileSplit; + private Configuration hadoopConf; + // the index is schema ordinal, the value is the csv ordinal + private int[] schema2csvIdx; + + // filter + private FilterExecuter filter; + // the index is the dimension ordinal, the value is the schema ordinal + private int[]
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2328 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5337/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2377 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5336/ ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2374 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5335/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2377 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5334/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2377 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5333/ ---
[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2265 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5332/ ---
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2328 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5331/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5227/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2377 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5226/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6393/ ---
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5330/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2377 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6392/ ---
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2328 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5224/ ---
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5329/ ---
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2328 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6390/ ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user praveenmeenakshi56 commented on the issue: https://github.com/apache/carbondata/pull/2377 retest this please ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2377 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5223/ ---
[GitHub] carbondata pull request #2384: [CARBONDATA-2608] SDK Support JSON data loadi...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2384 [CARBONDATA-2608] SDK Support JSON data loading directly (without AVRO conversion) What changes were proposed in this pull request? Currently SDK Support JSON data loading only with AVRO support. So, converting json to avro record and avro to carbon object is a two step process. Hence there is a need for a new carbonWriter that works with Json without AVRO. This PR implents that. Highlights: Works with just the json data and carbon schema. supports reading multiple json files in a folder. supports single row json write. How was this patch tested? Manual testing, and UTs are added in another PR. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? Yes, will be handled in separate PR - [ ] Testing done Yes, updated the UT. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata issue_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2384 commit 0c99d11c68d681f15c051d8c8e3ded5ced8b1708 Author: ajantha-bhat Date: 2018-06-15T10:21:16Z JsonCarbonWrtier ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2377 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6389/ ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196512608 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala --- @@ -403,6 +403,17 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser { partition = partitionSpec) } + /** + * The syntax of + * ALTER TABLE [dbName.]tableName ADD SEGMENT LOCATION 'path/to/data' + */ + protected lazy val addSegment: Parser[LogicalPlan] = +ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ +ADD ~ SEGMENT ~ LOCATION ~ stringLit <~ opt(";") ^^ { + case dbName ~ tableName ~ add ~ segment ~ location ~ filePath => --- End diff -- I think it should be `case dbName ~ tableName ~ filePath =>` ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196512126 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddSegmentCommand.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.management + +import java.util.UUID + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.execution.command.AtomicRunnableCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.util.FileUtils + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.mutate.CarbonUpdateUtil +import org.apache.carbondata.core.statusmanager.{FileFormat, LoadMetadataDetails, SegmentStatus, SegmentStatusManager} +import org.apache.carbondata.core.util.CarbonUtil +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.events.{OperationContext, OperationListenerBus} +import org.apache.carbondata.processing.loading.events.LoadEvents.LoadMetadataEvent +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} +import org.apache.carbondata.processing.util.CarbonLoaderUtil + +/** + * support `alter table tableName add segment location 'path'` command. + * It will create a segment and map the path of datafile to segment's storage + */ +case class CarbonAddSegmentCommand( +dbNameOp: Option[String], +tableName: String, +filePathFromUser: String, +var operationContext: OperationContext = new OperationContext) extends AtomicRunnableCommand { + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + var carbonTable: CarbonTable = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { +val dbName = CarbonEnv.getDatabaseName(dbNameOp)(sparkSession) +carbonTable = { + val relation = CarbonEnv.getInstance(sparkSession).carbonMetastore +.lookupRelation(Option(dbName), tableName)(sparkSession).asInstanceOf[CarbonRelation] + if (relation == null) { +LOGGER.error(s"Add segment failed due to table $dbName.$tableName not found") +throw new NoSuchTableException(dbName, tableName) + } + relation.carbonTable +} + +if (carbonTable.isHivePartitionTable) { + LOGGER.error("Ignore hive partition table for now") +} + +operationContext.setProperty("isOverwrite", false) +if (CarbonUtil.hasAggregationDataMap(carbonTable)) { + val loadMetadataEvent = new LoadMetadataEvent(carbonTable, false) + OperationListenerBus.getInstance().fireEvent(loadMetadataEvent, operationContext) +} +Seq.empty + } + + // will just mapping external files to segment metadata + override def processData(sparkSession: SparkSession): Seq[Row] = { --- End diff -- All these operations are metadata only, so I think this class should extend `MetadataProcessOpeation` instead ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196511544 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala --- @@ -700,6 +700,13 @@ class TableNewProcessor(cm: TableModel) { cm.tableName)) tableInfo.setLastUpdatedTime(System.currentTimeMillis()) tableInfo.setFactTable(tableSchema) +val format = cm.tableProperties.get(CarbonCommonConstants.FORMAT) --- End diff -- `format` table property should also be checked, now only csv is supported ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196510839 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -426,6 +439,22 @@ class CarbonScanRDD[T: ClassTag]( CarbonTimeStatisticsFactory.createExecutorRecorder(model.getQueryId)) streamReader.setQueryModel(model) streamReader +case FileFormat.EXTERNAL => + assert(storageFormat.equals("csv"), --- End diff -- should use if check instead of assert ---
[jira] [Updated] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion
[ https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2608: - Summary: SDK Support JSON data loading directly without AVRO conversion (was: Support JSON data loading directly into Carbon table.) > SDK Support JSON data loading directly without AVRO conversion > -- > > Key: CARBONDATA-2608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2608 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Priority: Major > > Support JSON data loading directly into Carbon table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196510278 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -515,12 +574,73 @@ private CarbonInputSplit convertToCarbonInputSplit(ExtendedBlocklet blocklet) th return split; } + private List convertToInputSplit4ExternalFormat(JobContext jobContext, + ExtendedBlocklet extendedBlocklet) throws IOException { +List splits = new ArrayList(); +String factFilePath = extendedBlocklet.getFilePath(); +Path path = new Path(factFilePath); +FileSystem fs = FileFactory.getFileSystem(path); +FileStatus fileStatus = fs.getFileStatus(path); +long length = fileStatus.getLen(); +if (length != 0) { + BlockLocation[] blkLocations = fs.getFileBlockLocations(path, 0, length); + long blkSize = fileStatus.getBlockSize(); + long minSplitSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(jobContext)); + long maxSplitSize = getMaxSplitSize(jobContext); + long splitSize = computeSplitSize(blkSize, minSplitSize, maxSplitSize); + long bytesRemaining = fileStatus.getLen(); + while (((double) bytesRemaining) / splitSize > 1.1) { +int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); +splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, +length - bytesRemaining, +splitSize, blkLocations[blkIndex].getHosts(), +blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL)); +bytesRemaining -= splitSize; + } + if (bytesRemaining != 0) { +int blkIndex = getBlockIndex(blkLocations, length - bytesRemaining); +splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, +length - bytesRemaining, +bytesRemaining, blkLocations[blkIndex].getHosts(), +blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL)); + } +} else { + splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), path, 0, length, + new String[0], FileFormat.EXTERNAL)); +} +return splits; + } + @Override public RecordReader createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException { Configuration configuration = taskAttemptContext.getConfiguration(); QueryModel queryModel = createQueryModel(inputSplit, taskAttemptContext); CarbonReadSupport readSupport = getReadSupportClass(configuration); -return new CarbonRecordReader(queryModel, readSupport); +if (inputSplit instanceof CarbonMultiBlockSplit +&& ((CarbonMultiBlockSplit) inputSplit).getFileFormat() == FileFormat.EXTERNAL) { + return createRecordReaderForExternalFormat(queryModel, readSupport, + configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY)); +} else if (inputSplit instanceof CarbonInputSplit +&& ((CarbonInputSplit) inputSplit).getFileFormat() == FileFormat.EXTERNAL) { + return createRecordReaderForExternalFormat(queryModel, readSupport, + configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY)); +} else { + return new CarbonRecordReader(queryModel, readSupport); +} + } + + @Since("1.4.1") --- End diff -- I think for private method, this annotation is not required ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196509935 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -174,9 +174,15 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO List result = new LinkedList(); // for each segment fetch blocks matching filter in Driver BTree -List dataBlocksOfSegment = -getDataBlocksOfSegment(job, carbonTable, filterResolver, matchedPartitions, -validSegments, partitionInfo, oldPartitionIdList); +List dataBlocksOfSegment; +if (carbonTable.getTableInfo().getFormat().equals("") --- End diff -- why support empty string? ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196509716 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java --- @@ -0,0 +1,510 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop; + +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.Reader; +import java.io.UnsupportedEncodingException; +import java.math.BigDecimal; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.intf.RowImpl; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.statusmanager.FileFormatProperties; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat; + +import com.univocity.parsers.csv.CsvParser; +import com.univocity.parsers.csv.CsvParserSettings; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +/** + * scan csv file and filter on it + */ +@InterfaceStability.Evolving +@InterfaceAudience.Internal +public class CsvRecordReader extends AbstractRecordReader { --- End diff -- This class is much like StreamRecordReader, and it implements filter execution on internal row, can you extract common code to a parent class? ---
[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2374#discussion_r196508522 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java --- @@ -0,0 +1,510 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop; + +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.Reader; +import java.io.UnsupportedEncodingException; +import java.math.BigDecimal; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.FilterUtil; +import org.apache.carbondata.core.scan.filter.GenericQueryType; +import org.apache.carbondata.core.scan.filter.executer.FilterExecuter; +import org.apache.carbondata.core.scan.filter.intf.RowImpl; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.statusmanager.FileFormatProperties; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat; + +import com.univocity.parsers.csv.CsvParser; +import com.univocity.parsers.csv.CsvParserSettings; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +/** + * scan csv file and filter on it + */ +@InterfaceStability.Evolving +@InterfaceAudience.Internal +public class CsvRecordReader extends AbstractRecordReader { + private static final LogService LOGGER = LogServiceFactory.getLogService( + CsvRecordReader.class.getName()); + private static final int MAX_BATCH_SIZE = + CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; + // vector reader + private boolean isVectorReader; + private T columnarBatch; + + // metadata + private CarbonTable carbonTable; + private CarbonColumn[] carbonColumns; + // input + private QueryModel queryModel; + private CarbonReadSupport readSupport; + private FileSplit fileSplit; + private Configuration hadoopConf; + // the index is schema ordinal, the value is the csv ordinal + private int[] schema2csvIdx; + + // filter + private FilterExecuter filter; + // the index is the dimension ordinal, the value is the schema ordinal + private int[]
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2374 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5222/ ---
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5328/ ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2374 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6387/ ---
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5327/ ---
[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2265 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5219/ ---
[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2265 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6384/ ---
[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/2383 @xuchuanyin then number of rows will depend on number of character in long string columns right? ---
[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2383#discussion_r196487039 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -371,8 +371,13 @@ private void setWritingConfiguration() throws CarbonDataWriterException { this.pageSize = Integer.parseInt(CarbonProperties.getInstance() .getProperty(CarbonCommonConstants.BLOCKLET_SIZE, CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL)); +// support less than 32000 rows in one page, because we support super long string, +// if it is long enough, a clomun page with 32000 rows will exceed 2GB if (version == ColumnarFormatVersion.V3) { - this.pageSize = CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; + this.pageSize = --- End diff -- how much is the default value for page size ? ---
[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2383 @kumarvishal09 I asked someone who has the longstring requirement and get the response that the length of string is about 100K. Since we don't want to change the internal implementation of column page, decreasing the row number in a page may be the only way to solve the problem. ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user praveenmeenakshi56 commented on the issue: https://github.com/apache/carbondata/pull/2377 retest this please ---
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2328 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5218/ ---
[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2374 @jackylk All the comments has been resolved except https://github.com/apache/carbondata/pull/2374#discussion_r195684966 ---
[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2377 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6386/ ---
[jira] [Resolved] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement
[ https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2585. -- Resolution: Fixed > Support Adding Local Dictionary configuration in Create table statement > --- > > Key: CARBONDATA-2585 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2585 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Allow user to pass local dictionary configuration in Create table statement. > *LOCAL_DICTIONARY_ENABLE : enable or disable local dictionary generation for > a table(default local dictionary generation will be true)* > {color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value > for local dictionary generation(default will be 1000)*{color} > {color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user > wants to generate local dictionary (default all the no dictionary string data > type columns will be considered for generation) {color} > {color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user > does not want to generate local dictionary (default no string datatype no > dictionary columns are excluded unless it is configured) {color} > CREATE TABLE carbontable( > column1 string, > column2 string, > column3 LONG ) > STORED BY 'carbondata' > TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* > '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command
[ https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2586. -- Resolution: Fixed > Support Showing local dictionary configuration in desc formatted command > > > Key: CARBONDATA-2586 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2586 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > > Support Showing local dictionary parameter in Desc formatted command > # *LOCAL_DICTIONARY_ENABLE* > # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} > # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color} > # *LOCAL_DICTIONARY_EXCLUDE* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command
[ https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2586: Description: Support Showing local dictionary parameter in Desc formatted command # *LOCAL_DICTIONARY_ENABLE* # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} # {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} was: Support Showing local dictionary parameter in Desc formatted command # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # {color:#00}*ENABLE_LOCAL_DICT*{color} > Support Showing local dictionary configuration in desc formatted command > > > Key: CARBONDATA-2586 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2586 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > > Support Showing local dictionary parameter in Desc formatted command > # *LOCAL_DICTIONARY_ENABLE* > # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} > # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} > # > {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command
[ https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2586: Description: Support Showing local dictionary parameter in Desc formatted command # *LOCAL_DICTIONARY_ENABLE* # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color} # *LOCAL_DICTIONARY_EXCLUDE* was: Support Showing local dictionary parameter in Desc formatted command # *LOCAL_DICTIONARY_ENABLE* # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} # *LOCAL_DICTIONARY_EXCLUDE*** > Support Showing local dictionary configuration in desc formatted command > > > Key: CARBONDATA-2586 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2586 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > > Support Showing local dictionary parameter in Desc formatted command > # *LOCAL_DICTIONARY_ENABLE* > # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} > # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color} > # *LOCAL_DICTIONARY_EXCLUDE* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command
[ https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2586: Description: Support Showing local dictionary parameter in Desc formatted command # *LOCAL_DICTIONARY_ENABLE* # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} # *LOCAL_DICTIONARY_EXCLUDE*** was: Support Showing local dictionary parameter in Desc formatted command # *LOCAL_DICTIONARY_ENABLE* # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} # {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} > Support Showing local dictionary configuration in desc formatted command > > > Key: CARBONDATA-2586 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2586 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > > Support Showing local dictionary parameter in Desc formatted command > # *LOCAL_DICTIONARY_ENABLE* > # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} > # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} > # *LOCAL_DICTIONARY_EXCLUDE*** -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/2383 @xuchuanyin I think better to restrict based on number of bytes 67104 for each column value, as user may not know how many character will be present , so its hard for the user to configure blocklet size. ---
[jira] [Updated] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement
[ https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2585: Description: Allow user to pass local dictionary configuration in Create table statement. *LOCAL_DICTIONARY_ENABLE : enable or disable local dictionary generation for a table(default local dictionary generation will be true)* {color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value for local dictionary generation(default will be 1000)*{color} {color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user wants to generate local dictionary (default all the no dictionary string data type columns will be considered for generation) {color} {color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user does not want to generate local dictionary (default no string datatype no dictionary columns are excluded unless it is configured) {color} CREATE TABLE carbontable( column1 string, column2 string, column3 LONG ) STORED BY 'carbondata' TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') was: Allow user to pass local dictionary configuration in Create table statement. *LOCAL_DICTIONARY_ENABLE* {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} {color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} CREATE TABLE carbontable( column1 string, column2 string, column3 LONG ) STORED BY 'carbondata' TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') > Support Adding Local Dictionary configuration in Create table statement > --- > > Key: CARBONDATA-2585 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2585 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Allow user to pass local dictionary configuration in Create table statement. > *LOCAL_DICTIONARY_ENABLE : enable or disable local dictionary generation for > a table(default local dictionary generation will be true)* > {color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value > for local dictionary generation(default will be 1000)*{color} > {color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user > wants to generate local dictionary (default all the no dictionary string data > type columns will be considered for generation) {color} > {color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user > does not want to generate local dictionary (default no string datatype no > dictionary columns are excluded unless it is configured) {color} > CREATE TABLE carbontable( > column1 string, > column2 string, > column3 LONG ) > STORED BY 'carbondata' > TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* > '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement
[ https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2585: Description: Allow user to pass local dictionary configuration in Create table statement. *LOCAL_DICTIONARY_ENABLE* {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} {color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} CREATE TABLE carbontable( column1 string, column2 string, column3 LONG ) STORED BY 'carbondata' TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') was: Allow user to pass local dictionary configuration in Create table statement. {color:#00}*ENABLE_LOCAL_DICT*{color} {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} {color:#00}CREATE TABLE carbontable({color} {color:#00} column1 string,{color} {color:#00} column2 string,{color} {color:#00} column3 LONG ){color} {color:#00} STORED BY 'carbondata'{color} {color:#00}TBLPROPERTIES('{color}{color:#00}*ENABLE_LOCAL_DICT*{color}{color:#00}'='{color}{color:#00}*true*{color}{color:#00}',{color}{color:#00}*CARBON_LOCALDICT_THRESHOLD=1000'*{color}{color:#00}){color} > Support Adding Local Dictionary configuration in Create table statement > --- > > Key: CARBONDATA-2585 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2585 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: Akash R Nilugal >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Allow user to pass local dictionary configuration in Create table statement. > *LOCAL_DICTIONARY_ENABLE* > {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color} > {color:#00}*LOCAL_DICTIONARY_INCLUDE***{color} > {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color} > CREATE TABLE carbontable( > column1 string, > column2 string, > column3 LONG ) > STORED BY 'carbondata' > TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',* > '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/2375 LGTM ---
[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2375 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5326/ ---
[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2328 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6383/ ---