[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224509 --- Diff: docs/ddl-of-carbondata.md --- @@ -681,24 +682,24 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE DATA TYPE + - # CHANGE COLUMN NAME/TYPE - This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher. + This command is used to change the column's name and the data type from INT to BIGINT or decimal precision from lower to higher and rename column. --- End diff -- Write this as below `This command is used to change column name and the data type from INT to BIGINT or decimal precision from lower to higher.` ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224386 --- Diff: docs/ddl-of-carbondata.md --- @@ -47,7 +47,8 @@ CarbonData DDL statements are documented here,which includes: * [RENAME TABLE](#rename-table) * [ADD COLUMNS](#add-columns) * [DROP COLUMNS](#drop-columns) -* [CHANGE DATA TYPE](#change-data-type) +* [RENAME COLUMN](#change-column-name-/-type) +* [CHANGE DATA TYPE](#change-column-name-/-type) --- End diff -- check for linking. With this text it will not link ---
[GitHub] carbondata pull request #3044: [CARBONDATA-3149]Documentation for alter tabl...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3044#discussion_r245224565 --- Diff: docs/ddl-of-carbondata.md --- @@ -681,24 +682,24 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE DATA TYPE + - # CHANGE COLUMN NAME/TYPE - This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher. + This command is used to change the column's name and the data type from INT to BIGINT or decimal precision from lower to higher and rename column. Change of decimal data type from lower precision to higher precision will only be supported for cases where there is no data loss. ``` - ALTER TABLE [db_name.]table_name CHANGE col_name col_name changed_column_type + ALTER TABLE [db_name.]table_name CHANGE old_col_name new_col_name column_data_type --- End diff -- Change this as below `ALTER TABLE [db_name.]table_name CHANGE col_old_name col_new_name column_type` ---
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r245224484 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -156,4 +161,206 @@ object DataLoadProcessBuilderOnSpark { Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) } } + + /** + * 1. range partition the whole input data + * 2. for each range, sort the data and writ it to CarbonData files + */ + def loadDataUsingRangeSort( + sparkSession: SparkSession, + model: CarbonLoadModel, + hadoopConf: Configuration): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +// initialize and prepare row counter +val sc = sparkSession.sparkContext +val modelBroadcast = sc.broadcast(model) +val partialSuccessAccum = sc.accumulator(0, "Partial Success Accumulator") +val inputStepRowCounter = sc.accumulator(0, "Input Processor Accumulator") +val convertStepRowCounter = sc.accumulator(0, "Convert Processor Accumulator") +val sortStepRowCounter = sc.accumulator(0, "Sort Processor Accumulator") +val writeStepRowCounter = sc.accumulator(0, "Write Processor Accumulator") + +// 1. Input +hadoopConf + .set(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, sparkSession.sparkContext.appName) +val inputRDD = CsvRDDHelper + .csvFileScanRDD(sparkSession, model, hadoopConf) + .mapPartitionsWithIndex { case (index, rows) => +DataLoadProcessorStepOnSpark + .internalInputFunc(rows, index, modelBroadcast, inputStepRowCounter) + } + +// 2. Convert +val conf = SparkSQLUtil.broadCastHadoopConf(sc, hadoopConf) +val convertRDD = inputRDD + .mapPartitionsWithIndex { case (index, rows) => + ThreadLocalSessionInfo.setConfigurationToCurrentThread(conf.value.value) +DataLoadProcessorStepOnSpark + .convertFunc(rows, index, modelBroadcast, partialSuccessAccum, convertStepRowCounter) + } + .filter(_ != null) + +// 3. Range partition by range_column +val configuration = DataLoadProcessBuilder.createConfiguration(model) +val rangeColumnIndex = + indexOfColumn(model.getRangePartitionColumn, configuration.getDataFields) +// convert RDD[CarbonRow] to RDD[(rangeColumn, CarbonRow)] +val keyRDD = convertRDD.keyBy(_.getObject(rangeColumnIndex)) +// range partition by key +val numPartitions = getNumPartitions(configuration, model, convertRDD) +val objectOrdering: Ordering[Object] = createOrderingForColumn(model.getRangePartitionColumn) +import scala.reflect.classTag +val sampleRDD = getSampleRDD(sparkSession, model, hadoopConf, configuration, modelBroadcast) +val rangeRDD = keyRDD + .partitionBy( +new DataSkewRangePartitioner(numPartitions, sampleRDD)(objectOrdering, classTag[Object])) + .map(_._2) + +// 4. Sort and Write data +sc.runJob(rangeRDD, (context: TaskContext, rows: Iterator[CarbonRow]) => + DataLoadProcessorStepOnSpark.sortAndWriteFunc(rows, context.partitionId, modelBroadcast, +writeStepRowCounter, conf.value.value)) + +// Log the number of rows in each step +LOGGER.info("Total rows processed in step Input Processor: " + inputStepRowCounter.value) +LOGGER.info("Total rows processed in step Data Converter: " + convertStepRowCounter.value) +LOGGER.info("Total rows processed in step Sort Processor: " + sortStepRowCounter.value) +LOGGER.info("Total rows processed in step Data Writer: " + writeStepRowCounter.value) + +// Update status +if (partialSuccessAccum.value != 0) { + val uniqueLoadStatusId = model.getTableName + CarbonCommonConstants.UNDERSCORE + + "Partial_Success" + val loadMetadataDetails = new LoadMetadataDetails() + loadMetadataDetails.setSegmentStatus(SegmentStatus.LOAD_PARTIAL_SUCCESS) + val executionErrors = new ExecutionErrors(FailureCauses.NONE, "") + executionErrors.failureCauses = FailureCauses.BAD_RECORDS + Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) +} else { + val uniqueLoadStatusId = model.getTableName + CarbonCommonConstants.UNDERSCORE + "Success" + val loadMetadataDetails = new LoadMetadataDetails() + loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS) + val executionErrors = new ExecutionErrors(FailureCauses.NONE, "") + Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) +} + } + + /** + * provide RDD
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r245224217 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala --- @@ -156,4 +161,206 @@ object DataLoadProcessBuilderOnSpark { Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) } } + + /** + * 1. range partition the whole input data + * 2. for each range, sort the data and writ it to CarbonData files + */ + def loadDataUsingRangeSort( + sparkSession: SparkSession, + model: CarbonLoadModel, + hadoopConf: Configuration): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +// initialize and prepare row counter +val sc = sparkSession.sparkContext +val modelBroadcast = sc.broadcast(model) +val partialSuccessAccum = sc.accumulator(0, "Partial Success Accumulator") +val inputStepRowCounter = sc.accumulator(0, "Input Processor Accumulator") +val convertStepRowCounter = sc.accumulator(0, "Convert Processor Accumulator") +val sortStepRowCounter = sc.accumulator(0, "Sort Processor Accumulator") +val writeStepRowCounter = sc.accumulator(0, "Write Processor Accumulator") + +// 1. Input +hadoopConf + .set(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, sparkSession.sparkContext.appName) +val inputRDD = CsvRDDHelper + .csvFileScanRDD(sparkSession, model, hadoopConf) + .mapPartitionsWithIndex { case (index, rows) => +DataLoadProcessorStepOnSpark + .internalInputFunc(rows, index, modelBroadcast, inputStepRowCounter) + } + +// 2. Convert +val conf = SparkSQLUtil.broadCastHadoopConf(sc, hadoopConf) +val convertRDD = inputRDD + .mapPartitionsWithIndex { case (index, rows) => + ThreadLocalSessionInfo.setConfigurationToCurrentThread(conf.value.value) +DataLoadProcessorStepOnSpark + .convertFunc(rows, index, modelBroadcast, partialSuccessAccum, convertStepRowCounter) + } + .filter(_ != null) + +// 3. Range partition by range_column +val configuration = DataLoadProcessBuilder.createConfiguration(model) +val rangeColumnIndex = + indexOfColumn(model.getRangePartitionColumn, configuration.getDataFields) +// convert RDD[CarbonRow] to RDD[(rangeColumn, CarbonRow)] +val keyRDD = convertRDD.keyBy(_.getObject(rangeColumnIndex)) +// range partition by key +val numPartitions = getNumPartitions(configuration, model, convertRDD) +val objectOrdering: Ordering[Object] = createOrderingForColumn(model.getRangePartitionColumn) +import scala.reflect.classTag +val sampleRDD = getSampleRDD(sparkSession, model, hadoopConf, configuration, modelBroadcast) +val rangeRDD = keyRDD + .partitionBy( +new DataSkewRangePartitioner(numPartitions, sampleRDD)(objectOrdering, classTag[Object])) + .map(_._2) + +// 4. Sort and Write data +sc.runJob(rangeRDD, (context: TaskContext, rows: Iterator[CarbonRow]) => + DataLoadProcessorStepOnSpark.sortAndWriteFunc(rows, context.partitionId, modelBroadcast, +writeStepRowCounter, conf.value.value)) + +// Log the number of rows in each step +LOGGER.info("Total rows processed in step Input Processor: " + inputStepRowCounter.value) +LOGGER.info("Total rows processed in step Data Converter: " + convertStepRowCounter.value) +LOGGER.info("Total rows processed in step Sort Processor: " + sortStepRowCounter.value) +LOGGER.info("Total rows processed in step Data Writer: " + writeStepRowCounter.value) + +// Update status +if (partialSuccessAccum.value != 0) { + val uniqueLoadStatusId = model.getTableName + CarbonCommonConstants.UNDERSCORE + + "Partial_Success" + val loadMetadataDetails = new LoadMetadataDetails() + loadMetadataDetails.setSegmentStatus(SegmentStatus.LOAD_PARTIAL_SUCCESS) + val executionErrors = new ExecutionErrors(FailureCauses.NONE, "") + executionErrors.failureCauses = FailureCauses.BAD_RECORDS + Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) +} else { + val uniqueLoadStatusId = model.getTableName + CarbonCommonConstants.UNDERSCORE + "Success" + val loadMetadataDetails = new LoadMetadataDetails() + loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS) + val executionErrors = new ExecutionErrors(FailureCauses.NONE, "") + Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors))) +} + } + + /** + * provide RDD
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r245223174 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -95,6 +96,67 @@ object DataLoadProcessorStepOnSpark { } } + def internalInputFunc( + rows: Iterator[InternalRow], + index: Int, + modelBroadcast: Broadcast[CarbonLoadModel], + rowCounter: Accumulator[Int]): Iterator[CarbonRow] = { +val model: CarbonLoadModel = modelBroadcast.value.getCopyWithTaskNo(index.toString) +val conf = DataLoadProcessBuilder.createConfiguration(model) +val rowParser = new RowParserImpl(conf.getDataFields, conf) +val isRawDataRequired = CarbonDataProcessorUtil.isRawDataRequired(conf) +TaskContext.get().addTaskFailureListener { (t: TaskContext, e: Throwable) => + wrapException(e, model) +} + +new Iterator[CarbonRow] { + override def hasNext: Boolean = rows.hasNext + + override def next(): CarbonRow = { +var row : CarbonRow = null +val rawRow = + rows.next().asInstanceOf[GenericInternalRow].values.asInstanceOf[Array[Object]] +if(isRawDataRequired) { + row = new CarbonRow(rowParser.parseRow(rawRow), rawRow) +} else { + row = new CarbonRow(rowParser.parseRow(rawRow)) +} +rowCounter.add(1) +row + } +} + } + + def internalSampleInputFunc( --- End diff -- ok ---
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r24542 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala --- @@ -95,6 +96,67 @@ object DataLoadProcessorStepOnSpark { } } + def internalInputFunc( + rows: Iterator[InternalRow], + index: Int, + modelBroadcast: Broadcast[CarbonLoadModel], + rowCounter: Accumulator[Int]): Iterator[CarbonRow] = { +val model: CarbonLoadModel = modelBroadcast.value.getCopyWithTaskNo(index.toString) +val conf = DataLoadProcessBuilder.createConfiguration(model) +val rowParser = new RowParserImpl(conf.getDataFields, conf) +val isRawDataRequired = CarbonDataProcessorUtil.isRawDataRequired(conf) +TaskContext.get().addTaskFailureListener { (t: TaskContext, e: Throwable) => + wrapException(e, model) +} + +new Iterator[CarbonRow] { + override def hasNext: Boolean = rows.hasNext + + override def next(): CarbonRow = { +var row : CarbonRow = null +val rawRow = + rows.next().asInstanceOf[GenericInternalRow].values.asInstanceOf[Array[Object]] +if(isRawDataRequired) { + row = new CarbonRow(rowParser.parseRow(rawRow), rawRow) +} else { + row = new CarbonRow(rowParser.parseRow(rawRow)) +} +rowCounter.add(1) +row + } +} + } + + def internalSampleInputFunc( --- End diff -- the parameter "rows" of them is a different type. ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10412/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2371/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2158/ ---
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r245198317 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -106,6 +106,24 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo sql("SELECT * FROM carbon_localsort_once ORDER BY name")) } + test("Make sure the result is right and sorted in global level for range_sort") { --- End diff -- ok ---
[GitHub] carbondata pull request #2971: [CARBONDATA-3219] Support range partition the...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2971#discussion_r245198367 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/DataSkewRangePartitioner.scala --- @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.io.{IOException, ObjectInputStream, ObjectOutputStream} + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.reflect.ClassTag +import scala.util.hashing.byteswap32 + +import org.apache.spark.rdd.{PartitionPruningRDD, RDD} +import org.apache.spark.serializer.JavaSerializer +import org.apache.spark.util.{CollectionsUtils, Utils} + +/** + * support data skew scenario + * copy from spark: RangePartiitoner + */ +class DataSkewRangePartitioner[K: Ordering : ClassTag, V]( --- End diff -- ok ---
[jira] [Updated] (CARBONDATA-3229) Validate the true/false for all boolean parameters
[ https://issues.apache.org/jira/browse/CARBONDATA-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated CARBONDATA-3229: --- Component/s: file-format core > Validate the true/false for all boolean parameters > -- > > Key: CARBONDATA-3229 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3229 > Project: CarbonData > Issue Type: Improvement > Components: core, file-format >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > > Validate the true/false for all boolean parameters when input, like in > carbonsession, sdk, spark carbon file format, in beeline and so on. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3229) Validate the true/false for all boolean parameters
Nicholas Jiang created CARBONDATA-3229: -- Summary: Validate the true/false for all boolean parameters Key: CARBONDATA-3229 URL: https://issues.apache.org/jira/browse/CARBONDATA-3229 Project: CarbonData Issue Type: Improvement Reporter: Nicholas Jiang Assignee: Nicholas Jiang Validate the true/false for all boolean parameters when input, like in carbonsession, sdk, spark carbon file format, in beeline and so on. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2370/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10411/ ---
[jira] [Created] (CARBONDATA-3228) Optimize import and fix some spell errors
xubo245 created CARBONDATA-3228: --- Summary: Optimize import and fix some spell errors Key: CARBONDATA-3228 URL: https://issues.apache.org/jira/browse/CARBONDATA-3228 Project: CarbonData Issue Type: Bug Affects Versions: 1.5.1 Reporter: xubo245 Optimize import: unused import: {code:java} org.apache.carbondata.spark.rdd.AddColumnPartition org.apache.carbondata.spark.testsuite.dataretention.DataRetentionTestCase CarbonDatasourceHadoopRelation {code} fix some spell errors: spell error: {code:java} databseName dimenisons dictfolderPath {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2157/ ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r245183568 --- Diff: docs/csdk-guide.md --- @@ -29,6 +29,32 @@ code and without CarbonSession. In the carbon jars package, there exist a carbondata-sdk.jar, including SDK reader for C++ SDK. + +##Compile/Build CSDK +CSDK supports cmake based compilation and has dependency list in CMakeLists.txt. + Prerequisites +GCC >=4.8.5 +Cmake >3.13 +Make >=4.1 + +Steps +1. Go to CSDK folder(/opt/.../CSDK/) +2. Create build folder . (/opt/.../CSDK/build) +3. Run Command from build folder `cmake ../` +4. `make` --- End diff -- I mean how to run code after make? ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r245183433 --- Diff: docs/csdk-guide.md --- @@ -29,6 +29,32 @@ code and without CarbonSession. In the carbon jars package, there exist a carbondata-sdk.jar, including SDK reader for C++ SDK. + +# Compile/Build CSDK --- End diff -- Please use ## Compile/Build CSDK, it's a sub-title of C++ SDK Reader ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r245183464 --- Diff: docs/csdk-guide.md --- @@ -29,6 +29,32 @@ code and without CarbonSession. In the carbon jars package, there exist a carbondata-sdk.jar, including SDK reader for C++ SDK. + +##Compile/Build CSDK --- End diff -- Please use ## Compile/Build CSDK, it's a sub-title of C++ SDK Reader ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r245183255 --- Diff: store/CSDK/test/main_ft.cpp --- @@ -0,0 +1,1172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include --- End diff -- ok ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r245183196 --- Diff: docs/csdk-guide.md --- @@ -29,6 +29,32 @@ code and without CarbonSession. In the carbon jars package, there exist a carbondata-sdk.jar, including SDK reader for C++ SDK. + +##Compile/Build CSDK +CSDK supports cmake based compilation and has dependency list in CMakeLists.txt. + Prerequisites +GCC >=4.8.5 +Cmake >3.13 +Make >=4.1 + +Steps +1. Go to CSDK folder(/opt/.../CSDK/) +2. Create build folder . (/opt/.../CSDK/build) +3. Run Command from build folder `cmake ../` +4. `make` + +Test Cases are written in [main.cpp](https://github.com/apache/carbondata/blob/master/store/CSDK/test/main.cpp) with GoogleTest C++ Framework. +if GoogleTest LIBRARY is not added then compilation of example code will fail. Please follow below steps to solve the same +1. Remove test/main.cpp from SOURCE_FILES of CMakeLists.txt and compile/build again. +2. Follow below Steps to configure GoogleTest Framework +* Download googleTest release (CI is complied with 1.8) https://github.com/google/googletest/releases +* Extract to folder like /opt/googletest/googletest-release-1.8.1/ and create build folder inside this like /opt/googletest/googletest-release-1.8.1/googletest/build) --- End diff -- @BJangir Please optimize it. ---
[GitHub] carbondata pull request #3030: [HOTFIX] Optimize the code style in csdk/sdk ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/3030 ---
[GitHub] carbondata issue #3030: [HOTFIX] Optimize the code style in csdk/sdk markdow...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3030 LGTM! Thanks for your contribution! ---
[GitHub] carbondata issue #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3035 @ajantha-bhat @KanakaKumar Please review it. ---
[jira] [Created] (CARBONDATA-3227) There are some spell errors in the project
xubo245 created CARBONDATA-3227: --- Summary: There are some spell errors in the project Key: CARBONDATA-3227 URL: https://issues.apache.org/jira/browse/CARBONDATA-3227 Project: CarbonData Issue Type: Bug Affects Versions: 1.5.1 Reporter: xubo245 There are some spell errors in the project: escapechar optionlist hivedefaultpartition pvalue errormsg isDetectAsDimentionDatatype Please fix it if there are other spell error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #3050: Optimize the upper/lower case problem
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3050 @bbinwang Please optimize the title and content of this PR. for example: [CARBONDATA-3226] Remove duplicated and useless files ---
[GitHub] carbondata issue #3049: [CARBONDATA-3226] Remove duplicated and useless file...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3049 @runzhliu Are there any other similar files in the project? Can you help to check? ---
[GitHub] carbondata issue #3049: [CARBONDATA-3226] Remove duplicated and useless file...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3049 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10410/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2368/ ---
[GitHub] carbondata issue #3049: [CARBONDATA-3226] Remove duplicated and useless file...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3049 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2369/ ---
[GitHub] carbondata issue #3050: Optimize the upper/lower case problem
Github user bbinwang commented on the issue: https://github.com/apache/carbondata/pull/3050 > Can one of the admins verify this patch? are you machineï¼ ---
[GitHub] carbondata issue #3050: Optimize the upper/lower case problem
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3050 Can one of the admins verify this patch? ---
[GitHub] carbondata pull request #3050: Optimize the upper/lower case problem
GitHub user bbinwang opened a pull request: https://github.com/apache/carbondata/pull/3050 Optimize the upper/lower case problem Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ no] Any interfaces changed? - [ no] Any backward compatibility impacted? - [ no] Document update required? - [ no] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [yes ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bbinwang/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3050.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3050 commit d53a178dea88fc28b3944c443cc3868a9027005b Author: bbinwang Date: 2019-01-02T14:42:10Z Merge pull request #1 from apache/master pull commit ea3b157638f2dcd0d88c2aa34246a210fbb54d00 Author: binw...@163.com <513338github> Date: 2019-01-03T15:22:51Z Optimize the upper/lower case problem commit b4bb5fba4d9b37a5797d2030e9e916574d4795dd Author: binw...@163.com <513338github> Date: 2019-01-03T15:29:01Z Optimize the upper/lower case problem ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10409/ ---
[GitHub] carbondata issue #3044: [CARBONDATA-3149]Documentation for alter table colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3044 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10408/ ---
[GitHub] carbondata issue #3049: [CARBONDATA-3226] Remove duplicated and useless file...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3049 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2156/ ---
[GitHub] carbondata pull request #3049: [CARBONDATA-3226] Remove duplicated and usele...
GitHub user runzhliu opened a pull request: https://github.com/apache/carbondata/pull/3049 [CARBONDATA-3226] Remove duplicated and useless files Remove duplicated and useless files from the project. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? - [x] Any backward compatibility impacted? - [x] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/runzhliu/carbondata dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3049 commit 1e20d6a217eddd93bd70e8f6be1547e362f2cd7f Author: Oscar Date: 2019-01-03T14:20:16Z [CARBONDATA-3226] Remove duplicated and useless files Remove duplicated and useless files from the project. ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2155/ ---
[GitHub] carbondata issue #3044: [CARBONDATA-3149]Documentation for alter table colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3044 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2367/ ---
[GitHub] carbondata issue #3026: [CARBONDATA-3193] Added support to compile carbon CD...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10406/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10407/ ---
[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2963#discussion_r245013088 --- Diff: pom.xml --- @@ -527,6 +526,7 @@ examples/spark2 datamap/lucene datamap/bloom +datamap/example --- End diff -- I think it is better not to add this, since it will make the assembling bigger ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r245003004 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true) --- End diff -- As it is a metadata function, we are just computing it once and saving it while passing TRUE in 'calculateDataIndexSize' this function. So the value computed can be used afterwards also. ---
[GitHub] carbondata issue #3044: [CARBONDATA-3149]Documentation for alter table colum...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3044 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2154/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2153/ ---
[GitHub] carbondata issue #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3010 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10404/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2366/ ---
[jira] [Updated] (CARBONDATA-3226) Remove duplicated and useless files
[ https://issues.apache.org/jira/browse/CARBONDATA-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Runzhong updated CARBONDATA-3226: - Priority: Minor (was: Major) > Remove duplicated and useless files > --- > > Key: CARBONDATA-3226 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3226 > Project: CarbonData > Issue Type: Bug >Reporter: Liu Runzhong >Assignee: Liu Runzhong >Priority: Minor > > Remove duplicated and useless files from the project. > For example, org/apache/carbondata/spark/rdd/CarbonMergeFilesRDD.scala has > duplication of name with org/apache/spark/rdd/CarbonMergeFilesRDD.scala, but > without any content at all. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3226) Remove duplicated and useless files
Liu Runzhong created CARBONDATA-3226: Summary: Remove duplicated and useless files Key: CARBONDATA-3226 URL: https://issues.apache.org/jira/browse/CARBONDATA-3226 Project: CarbonData Issue Type: Bug Reporter: Liu Runzhong Assignee: Liu Runzhong Remove duplicated and useless files from the project. For example, org/apache/carbondata/spark/rdd/CarbonMergeFilesRDD.scala has duplication of name with org/apache/spark/rdd/CarbonMergeFilesRDD.scala, but without any content at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r244992105 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -126,6 +126,12 @@ case class PreAggregateTableHelper( newLongStringColumn.mkString(",")) } +//Add long_string_columns properties in child table from the parent. +tableProperties --- End diff -- Done. @kumarvishal09 please review. ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r244992035 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala --- @@ -333,6 +333,36 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi sql(s"DROP DATAMAP IF EXISTS $datamapName ON TABLE $longStringTable") } + test("creating datamap with long string column selected and loading data should be success") { + +sql(s"drop table if exists $longStringTable") +val datamapName = "pre_agg_dm" +sql( + s""" + | CREATE TABLE if not exists $longStringTable( + | id INT, name STRING, description STRING, address STRING, note STRING + | ) STORED BY 'carbondata' + | TBLPROPERTIES('LONG_STRING_COLUMNS'='description, note', 'SORT_COLUMNS'='name') + |""".stripMargin) + +sql( + s""" + | CREATE DATAMAP $datamapName ON TABLE $longStringTable + | USING 'preaggregate' + | AS SELECT id,description,note,count(*) FROM $longStringTable + | GROUP BY id,description,note + |""". +stripMargin) + +sql( + s""" + | LOAD DATA LOCAL INPATH '$inputFile' INTO TABLE $longStringTable + | OPTIONS('header'='false') + """.stripMargin) + +sql(s"drop table if exists $longStringTable") --- End diff -- Added! ---
[GitHub] carbondata issue #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3010 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2152/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10405/ ---
[GitHub] carbondata issue #3026: [CARBONDATA-3193] Added support to compile carbon CD...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2151/ ---
[GitHub] carbondata issue #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3035 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2149/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2150/ ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2148/ ---
[GitHub] carbondata issue #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3035 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10402/ ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10401/ ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244980360 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true) --- End diff -- Show segments is a read only query. I think we should not perform write operation in a query. So, I feel its better to calculate every time and show OR just display as not available. ---
[GitHub] carbondata issue #3026: [CARBONDATA-3193] Added support to compile carbon CD...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3026 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2365/ ---
[GitHub] carbondata issue #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3010 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2358/ ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2361/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10403/ ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10400/ ---
[GitHub] carbondata issue #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3035 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2359/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2147/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2364/ ---
[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3045 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10398/ ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2356/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10399/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2146/ ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2145/ ---
[GitHub] carbondata pull request #3029: [CARBONDATA-3200] No-Sort compaction
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3029#discussion_r244959393 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SingleThreadFinalSortFilesMerger.java --- @@ -114,6 +113,31 @@ public void startFinalMerge() throws CarbonDataWriterException { startSorting(filesToMerge); } + /** + * Below method will be used to add in memory raw result iterator to priority queue. + * This will be called in case of compaction, when it is compacting sorted and unsorted + * both type of carbon data file + * This method will add sorted file's RawResultIterator to priority queue using + * InMemorySortTempChunkHolder as wrapper + * + * @param sortedRawResultMergerList + * @param segmentProperties + * @param noDicAndComplexColumns + * @throws CarbonSortKeyAndGroupByException + */ + public void addInMemoryRawResultIterator(List sortedRawResultMergerList, + SegmentProperties segmentProperties, CarbonColumn[] noDicAndComplexColumns, + DataType[] measureDataType) + throws CarbonSortKeyAndGroupByException { +for (RawResultIterator rawResultIterator : sortedRawResultMergerList) { + InMemorySortTempChunkHolder inMemorySortTempChunkHolder = + new InMemorySortTempChunkHolder(rawResultIterator, segmentProperties, + noDicAndComplexColumns, sortParameters, measureDataType); + inMemorySortTempChunkHolder.readRow(); --- End diff -- Don't need to check hasNext here before reading the row first time? ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2353/ ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/3047 retest this please ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2349/ ---
[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3045 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2352/ ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244957746 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) --- End diff -- Fixed. ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244957693 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -46,9 +47,9 @@ object CarbonStore { def showSegments( limit: Option[String], - tablePath: String, + carbonTable: CarbonTable, --- End diff -- Done. ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10397/ ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user NamanRastogi commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r244947621 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -191,10 +191,17 @@ case class CarbonLoadDataCommand( optionsFinal .put("complex_delimiter_level_4", ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value()) -optionsFinal.put("sort_scope", tableProperties.asScala.getOrElse("sort_scope", - carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, -carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, - CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT +optionsFinal.put( + "sort_scope", + options.getOrElse( +"sort_scope", +tableProperties.asScala.getOrElse( + "sort_scope", + carbonProperty.getProperty( +CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, +carbonProperty.getProperty( + CarbonCommonConstants.LOAD_SORT_SCOPE, + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) --- End diff -- No need to handle for SDK. Done for PreAgg. ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244947464 --- Diff: store/CSDK/test/main.cpp --- @@ -709,6 +709,7 @@ bool testWithTableProperty(JNIEnv *env, char *path, int argc, char **argv) { writer.outputPath(path); writer.withCsvInput(jsonSchema); writer.withTableProperty("sort_columns", "shortField"); +writer.enableLocalDictionary(false); --- End diff -- add test code for it. In CPP, it will convert NULL to false when user invoke CarbonWriter::enableLocalDictionary. ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2355/ ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244946154 --- Diff: store/CSDK/test/main.cpp --- @@ -546,7 +546,7 @@ bool testWriteData(JNIEnv *env, char *path, int argc, char *argv[]) { writer.withCsvInput(jsonSchema); writer.withLoadOption("complex_delimiter_level_1", "#"); writer.writtenBy("CSDK"); -writer.taskNo(185); +writer.taskNo(15541554.81); --- End diff -- There are some different between CPP and java: java taskNo don't support double, but CPP will convert double to long. ---
[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3045 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2144/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10396/ ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244945691 --- Diff: store/CSDK/test/main.cpp --- @@ -853,7 +854,7 @@ int main(int argc, char *argv[]) { } else { int batch = 32000; int printNum = 32000; - +// --- End diff -- ok, done ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3048 @ajantha-bhat @KanakaKumar Please review it. ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/3047 LGTM...can be merged once build passes ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10395/ ---
[jira] [Created] (CARBONDATA-3225) Can't save spark dataframe as carbon format In Windows system
Zhang Mei created CARBONDATA-3225: - Summary: Can't save spark dataframe as carbon format In Windows system Key: CARBONDATA-3225 URL: https://issues.apache.org/jira/browse/CARBONDATA-3225 Project: CarbonData Issue Type: New Feature Components: core, hadoop-integration, spark-integration Affects Versions: 1.5.1 Environment: Spark2.1.0, hadoop2.7.2, Carbon1.5.1 Reporter: Zhang Mei When I try to save a dataframe as carbon format in Windows7 system using this way : carbonSession.createDataFrame(outputData.rdd, outputData.schema) .write.format("carbon").mode(SaveMode.Overwrite).save(datapath) The datapath is a s3a path. Then I get this error: 2019-01-03 16:30:45 ERROR CarbonUtil:167 - Error while closing stream:java.io.IOException: Stream Closed java.io.IOException: Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.carbondata.core.util.CarbonUtil.closeStream(CarbonUtil.java:181) at org.apache.carbondata.core.util.CarbonUtil.closeStreams(CarbonUtil.java:165) at org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.commitCurrentFile(AbstractFactDataWriter.java:272) at org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:383) at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:395) at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.close(DataWriterProcessorStepImpl.java:251) at org.apache.carbondata.processing.loading.DataLoadExecutor.close(DataLoadExecutor.java:90) at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:275) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.io.IOException: Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ... 13 more 2019-01-03 16:30:45 INFO CarbonUtil:2733 - Copying \Temp\/390621183953547_attempt_20190103163036_0004_m_00_0\Fact\Part0\Segment_null\390621178232428\part-0-390621178232428_batchno0-0-null-390619717738707.carbondata to s3a://obs-test/zzz/obsCarbon_3/_temporary/0/_temporary/attempt_20190103163036_0004_m_00_0, operation id 1546504245269 2019-01-03 16:30:45 ERROR CarbonTableOutputFormat:458 - Error while loading data java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:456) at org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat$CarbonOutputWriter.close(SparkCarbonFileFormat.scala:297) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:252) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:191) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2348/ ---
[GitHub] carbondata issue #2971: [CARBONDATA-3219] Support range partition the input ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2971 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2347/ ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2143/ ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3048 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2142/ ---
[GitHub] carbondata issue #3026: [CARBONDATA-3193] Added support to compile carbon CD...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2345/ ---
[GitHub] carbondata issue #3026: [CARBONDATA-3193] Added support to compile carbon CD...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3026 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10393/ ---