[GitHub] carbondata pull request #3064: [WIP] Updated DOC for No-Sort Compaction and ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3064#discussion_r246777849 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/execution/command/CarbonHiveCommands.scala --- @@ -127,6 +127,9 @@ object CarbonSetCommand { else if (isCarbonProperty) { sessionParams.addProperty(key, value) } +else { --- End diff -- remove this.. if spark property is set then it should not be validated by carbon ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/3014 LGTM ---
[GitHub] carbondata issue #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2996 LGTM ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r245884441 --- Diff: integration/spark2/src/test/scala/org/apache/spark/carbondata/commands/SetCommandTestCase.scala --- @@ -128,6 +128,34 @@ class SetCommandTestCase extends Spark2QueryTest with BeforeAndAfterAll{ sql(s"set ${CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS}")) } } + + test(s"test set ${ CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE } for valid options") { +checkAnswer( + sql(s"set ${ CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE }=no_sort"), --- End diff -- As the check property is being validated to have dbname and tablename in CarbonHiveCommands.scala therefore this test case should have failed. There is a problem in https://github.com/apache/carbondata/pull/3014/files#diff-eeaa616740e33846839055324b58aa4bR87 as all the properties with @CarbonProperty annotations are being added blindly. Please fix this. ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r245879143 --- Diff: core/src/main/java/org/apache/carbondata/core/util/SessionParams.java --- @@ -229,6 +229,12 @@ private boolean validateKeyValue(String key, String value) throws InvalidConfigu if (!isValid) { throw new InvalidConfigurationException("Invalid value " + value + " for key " + key); } +} else if (key.startsWith(CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE)) { + isValid = CarbonUtil.isValidSortOption(value); + if (!isValid) { +throw new InvalidConfigurationException("The sort scope " + key ++ " can have only either BATCH_SORT or LOCAL_SORT or NO_SORT."); --- End diff -- Sort scope can be global as well. Add in exception message ---
[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3046#discussion_r245877242 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java --- @@ -1491,6 +1491,27 @@ private void validateSortMemorySpillPercentage() { } } + public int getMaxDictionaryThreshold() { +int localDictionaryMaxThreshold = Integer.parseInt(carbonProperties + .getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD, + CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD_DEFAULT)); +if (localDictionaryMaxThreshold --- End diff -- ok ---
[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3046#discussion_r245653091 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/DecoderBasedFallbackEncoder.java --- @@ -57,10 +57,7 @@ public DecoderBasedFallbackEncoder(EncodedColumnPage encodedColumnPage, int page int pageSize = encodedColumnPage.getActualPage().getPageSize(); int offset = 0; -int[] reverseInvertedIndex = new int[pageSize]; --- End diff -- In case of No_Sort where inverted index is not there, this variable was being created unnecessarily. ---
[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/3045 LGTM ---
[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3046#discussion_r24497 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -2076,4 +2076,15 @@ private CarbonCommonConstants() { */ public static final String CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512"; + public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD = --- End diff -- changed to CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD ---
[GitHub] carbondata issue #3046: [WIP] Added check to start fallback based on size
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/3046 @xuchuanyin The problem was that when using varchar column with email data the key for the dictionary map is very huge. When fallback happens the same data is kept in memory twice, which causes the application to throw OOM exception. As this is a minor version therefore we do not want to expose this property to user, we can take in the next major version. Plus parquet also has a size based limitation mechanism which ensures the size does not grow more than what the system can handle. This PR is just to add the size based limitation so that the map size can be controlled. ---
[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/3046 [WIP] Added check to start fallback based on size Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata oom_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3046.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3046 commit cfb870789eaa93ebe03393323456cdf2ed4daf17 Author: kunal642 Date: 2019-01-02T10:09:20Z fixed ---
[GitHub] carbondata issue #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Issue
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/3010 @Shubh18s Please fix 2.3 build ---
[GitHub] carbondata pull request #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Is...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3010#discussion_r244085456 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala --- @@ -106,22 +106,29 @@ class CarbonLateDecodeRule extends Rule[LogicalPlan] with PredicateHelper { def validateQueryDirectlyOnDataMap(relations: Seq[CarbonDecoderRelation]): Unit = { var isPreAggDataMapExists = false // first check if pre aggregate data map exists or not -relations.foreach{relation => +relations.foreach { relation => if (relation.carbonRelation.carbonTable.isChildDataMap) { isPreAggDataMapExists = true } } val validateQuery = CarbonProperties.getInstance - .getProperty(CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP, - CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE).toBoolean + .getProperty(CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP) var isThrowException = false // if validate query is enabled and relation contains pre aggregate data map --- End diff -- Change the comment according to the changes done ---
[GitHub] carbondata pull request #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Is...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3010#discussion_r244084957 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1449,12 +1449,9 @@ private CarbonCommonConstants() { public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false"; - @CarbonProperty public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP = "carbon.query.validate.direct.query.on.datamap"; --- End diff -- I think we can remove this property and use carbon.query.directQueryOnDataMap.enabled. Need to remove from configuration-paramters.md ---
[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2988 LGTM ---
[GitHub] carbondata issue #2951: [SDV] Add datasource testcases for Spark File Format
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2951 @shivamasn Please add test cases for map type too ---
[GitHub] carbondata issue #2983: [CARBONDATA-3119] Fixed SDK Write for Complex Array ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2983 LGTM ---
[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2899 LGTM ---
[GitHub] carbondata pull request #2983: [CARBONDATA-3119] Fixed SDK Write for Complex...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2983#discussion_r242403727 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/ArrayParserImpl.java --- @@ -56,6 +56,10 @@ public ArrayObject parse(Object data) { } return new ArrayObject(array); } + } else if (value.isEmpty()) { +Object[] array = new Object[1]; +array[0] = child.parse(value); --- End diff -- Why not use value instead of child.parse?? ---
[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2994 ---
[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2994 [WIP][CARBONDATA-2670] changed the impl of s3 renameforce to rewrite Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-2670 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2994.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2994 commit b96fa42ee1fa0d3ba2d2081574d3a71eadb26ac2 Author: kunal642 Date: 2018-12-17T13:53:14Z [CARBONDATA-2670] changed the impl of s3 renameforce to rewrite ---
[GitHub] carbondata issue #2989: [CARBONDATA-3175]Fix Testcase failures in complex de...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2989 LGTM ---
[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2925 @xubo245 I dont think we are supporting thriftServer to be run through IntelliJ on local. In cluster mode this jar would already be available as part of spark dependency. For SDK because it is dependent on all modules therefore httpclient is required. @manishgupta88 @kumarvishal09 @jackylk @ravipesala What are your opinions? ---
[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2968 LGTM ---
[GitHub] carbondata pull request #2977: [CARBONDATA-3147] Fixed concurrent load issue
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2977#discussion_r240473257 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -111,6 +113,29 @@ trait CommitHelper { } } + def mergeTableStatusContents(uuidTableStatusPath: String, + tableStatusPath: String): Boolean = { +try { --- End diff -- moved lock acquiring locking inside this method ---
[GitHub] carbondata pull request #2977: [CARBONDATA-3147] Fixed concurrent load issue
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2977#discussion_r240473230 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -111,6 +113,29 @@ trait CommitHelper { } } + protected def mergeTableStatusContents(uuidTableStatusPath: String, --- End diff -- done ---
[GitHub] carbondata issue #2972: [CARBONDATA-3143] Fixed local dictionary in presto
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2972 LGTM ---
[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2977#discussion_r240180385 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/AggregateDataMapCompactor.scala --- @@ -79,9 +79,20 @@ class AggregateDataMapCompactor(carbonLoadModel: CarbonLoadModel, CarbonSession.threadSet(CarbonCommonConstants.SUPPORT_DIRECT_QUERY_ON_DATAMAP, "true") loadCommand.processData(sqlContext.sparkSession) -val newLoadMetaDataDetails = SegmentStatusManager.readLoadMetadata( +val oldMetadataDetails = SegmentStatusManager.readLoadMetadata( + carbonTable.getMetadataPath, "") +val newMetadataDetails = SegmentStatusManager.readLoadMetadata( carbonTable.getMetadataPath, uuid) -val updatedLoadMetaDataDetails = newLoadMetaDataDetails collect { +val mergedContent = oldMetadataDetails.collect { + case content => +val contentIndex = newMetadataDetails.indexOf(content) --- End diff -- done ---
[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2968 retest this please ---
[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2968 LGTM ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2981 LGTM ---
[GitHub] carbondata pull request #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2981#discussion_r240026697 --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala --- @@ -998,9 +999,19 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll { i += 1 } writer.close() - spark.sql("create table complextable (stringfield string, structfield struct) " + -s"using carbon location '$path'") + if (SparkUtil.isSparkVersionEqualTo("2.1")) { +if (!FileFactory.isFileExist(path)) { + FileFactory.createDirectoryAndSetPermission(path, +new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL)) +} +spark.sql("create table complextable (stringfield string, structfield struct) " + + s"using carbon options(path '$path')") + } else if (SparkUtil.isSparkVersionXandAbove("2.2")) { --- End diff -- yeah, right ---
[GitHub] carbondata pull request #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2981#discussion_r240026585 --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala --- @@ -998,9 +999,19 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll { i += 1 } writer.close() - spark.sql("create table complextable (stringfield string, structfield struct) " + -s"using carbon location '$path'") + if (SparkUtil.isSparkVersionEqualTo("2.1")) { +if (!FileFactory.isFileExist(path)) { + FileFactory.createDirectoryAndSetPermission(path, +new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL)) +} +spark.sql("create table complextable (stringfield string, structfield struct) " + + s"using carbon options(path '$path')") + } else if (SparkUtil.isSparkVersionXandAbove("2.2")) { --- End diff -- change the else if to else so that the same command can run for 2.3 as well. Change the same for other tests as well ---
[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2977#discussion_r239331535 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala --- @@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener extends OperationEventListener // Generate table status file name without UUID, forExample: tablestatus val newTableSchemaPath = CarbonTablePath.getTableStatusFilePath( childCarbonTable.getTablePath) -renameDataMapTableStatusFiles(oldTableSchemaPath, newTableSchemaPath, uuid) +mergeTableStatusContents(oldTableSchemaPath, newTableSchemaPath, uuid) } // if true then the commit for one of the child tables has failed val commitFailed = renamedDataMaps.lengthCompare(childCommands.length) != 0 if (commitFailed) { LOGGER.info("Reverting table status file to original state") - renamedDataMaps.foreach { -command => - val carbonTable = command.table - // rename the backup tablestatus i.e tablestatus_backup_UUID to tablestatus - val backupTableSchemaPath = - CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + uuid - val tableSchemaPath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) - renameDataMapTableStatusFiles(backupTableSchemaPath, tableSchemaPath, "") + LOGGER.warn("Reverting table status file to original state") --- End diff -- removed ---
[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2977 [WIP] [CARBONDATA-3147] Fixed concurrent load issue Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3147 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2977 commit d48374164a2b8ff129973e0e15f48c867ccddc1a Author: kunal642 Date: 2018-12-05T10:30:41Z Fixed concurrent load issue ---
[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2940#discussion_r238161863 --- Diff: integration/spark2/pom.xml --- @@ -105,6 +105,11 @@ + + org.apache.httpcomponents --- End diff -- @xubo245 This change is already there in #2925 . We can discuss on that PR on this. Please remove the changes from this PR. @jackylk Let us discuss whether this is actually required. Please refer the PR #2925 and give inputs ---
[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2940#discussion_r238142863 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala --- @@ -463,6 +464,39 @@ class TestPreAggCreateCommand extends QueryTest with BeforeAndAfterAll { executorService.shutdown() } + test("support set carbon.query.directQueryOnDataMap.enabled=true") { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath +val testData = s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv" +sql("drop table if exists mainTable") +sql( + s""" + | CREATE TABLE mainTable + | (id Int, + | name String, + | city String, + | age Int) + | STORED BY 'org.apache.carbondata.format' + """.stripMargin); + + +sql( + s""" + | LOAD DATA LOCAL INPATH '$testData' + | into table mainTable + """.stripMargin); + +sql( + s""" + | create datamap preagg_sum on table mainTable + | using 'preaggregate' + | as select id,sum(age) from mainTable group by id + """.stripMargin); + +sql("set carbon.query.directQueryOnDataMap.enabled=true"); --- End diff -- Setting this property here would not be of use as this property is never validated in tests. Check https://github.com/apache/carbondata/blob/master/integration/spark-common/src/main/scala/org/apache/spark/sql/test/util/QueryTest.scala#L45 for reference ---
[GitHub] carbondata issue #2964: [HOTFIX] Fix ArrayOutOfBound exception when duplicat...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2964 LGTM ---
[GitHub] carbondata issue #2965: [Documentation] Editorial review
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2965 LGTM ---
[GitHub] carbondata issue #2951: [SDV] Add datasource testcases for Spark File Format
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2951 @shivamasn Please add description for the PR. Also attach test report in the description. ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r237781038 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.datasource + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} +import scala.collection.JavaConverters._ +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + var writerPath = new File(this.getClass.getResource("/").getPath ++ +"../." + + "./src/test/resources/SparkCarbonFileFormat/WriterOutput/") +.getCanonicalPath + //getCanonicalPath gives path with \, but the code expects /. + writerPath = writerPath.replace("\\", "/"); + + def buildTestData(): Any = { + +FileUtils.deleteDirectory(new File(writerPath)) + +val schema = new StringBuilder() + .append("[ \n") + .append(" {\"name\":\"string\"},\n") + .append(" {\"age\":\"int\"},\n") + .append(" {\"height\":\"double\"}\n") + .append("]") + .toString() + +try { + val builder = CarbonWriter.builder() + val writer = + builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build() + var i = 0 + while (i < 100) { +writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2))) +i += 1 + } + writer.close() +} catch { + case _: Throwable => None +} + } + + def cleanTestData() = { +FileUtils.deleteDirectory(new File(writerPath)) + } + + def deleteIndexFile(path: String, extension: String) : Unit = { +val file: CarbonFile = FileFactory + .getCarbonFile(path, FileFactory.getFileType(path)) + +for (eachDir <- file.listFiles) { + if (!eachDir.isDirectory) { +if (eachDir.getName.endsWith(extension)) { + CarbonUtil.deleteFoldersAndFilesSilent(eachDir) +} + } else { +deleteIndexFile(eachDir.getPath, extension) + } +} + } + + test("Running SQL directly and read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ")
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r237779541 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.datasource + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} +import scala.collection.JavaConverters._ +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + var writerPath = new File(this.getClass.getResource("/").getPath ++ +"../." + + "./src/test/resources/SparkCarbonFileFormat/WriterOutput/") +.getCanonicalPath + //getCanonicalPath gives path with \, but the code expects /. + writerPath = writerPath.replace("\\", "/"); + + def buildTestData(): Any = { + +FileUtils.deleteDirectory(new File(writerPath)) + +val schema = new StringBuilder() + .append("[ \n") + .append(" {\"name\":\"string\"},\n") + .append(" {\"age\":\"int\"},\n") + .append(" {\"height\":\"double\"}\n") + .append("]") + .toString() + +try { + val builder = CarbonWriter.builder() + val writer = + builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build() + var i = 0 + while (i < 100) { +writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2))) +i += 1 + } + writer.close() +} catch { + case _: Throwable => None +} + } + + def cleanTestData() = { +FileUtils.deleteDirectory(new File(writerPath)) + } + + def deleteIndexFile(path: String, extension: String) : Unit = { +val file: CarbonFile = FileFactory + .getCarbonFile(path, FileFactory.getFileType(path)) + +for (eachDir <- file.listFiles) { + if (!eachDir.isDirectory) { +if (eachDir.getName.endsWith(extension)) { + CarbonUtil.deleteFoldersAndFilesSilent(eachDir) +} + } else { +deleteIndexFile(eachDir.getPath, extension) + } +} + } + + test("Running SQL directly and read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ")
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r237779383 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.datasource + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} +import scala.collection.JavaConverters._ +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + var writerPath = new File(this.getClass.getResource("/").getPath ++ +"../." + + "./src/test/resources/SparkCarbonFileFormat/WriterOutput/") +.getCanonicalPath + //getCanonicalPath gives path with \, but the code expects /. + writerPath = writerPath.replace("\\", "/"); + + def buildTestData(): Any = { + +FileUtils.deleteDirectory(new File(writerPath)) + +val schema = new StringBuilder() + .append("[ \n") + .append(" {\"name\":\"string\"},\n") + .append(" {\"age\":\"int\"},\n") + .append(" {\"height\":\"double\"}\n") + .append("]") + .toString() + +try { + val builder = CarbonWriter.builder() + val writer = + builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build() + var i = 0 + while (i < 100) { +writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2))) +i += 1 + } + writer.close() +} catch { + case _: Throwable => None +} + } + + def cleanTestData() = { +FileUtils.deleteDirectory(new File(writerPath)) + } + + def deleteIndexFile(path: String, extension: String) : Unit = { +val file: CarbonFile = FileFactory + .getCarbonFile(path, FileFactory.getFileType(path)) + +for (eachDir <- file.listFiles) { + if (!eachDir.isDirectory) { +if (eachDir.getName.endsWith(extension)) { + CarbonUtil.deleteFoldersAndFilesSilent(eachDir) +} + } else { +deleteIndexFile(eachDir.getPath, extension) + } +} + } + + test("Running SQL directly and read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ")
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r237780471 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.datasource --- End diff -- Please format all the newly added code ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r237779458 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.datasource + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} +import scala.collection.JavaConverters._ +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + var writerPath = new File(this.getClass.getResource("/").getPath ++ +"../." + + "./src/test/resources/SparkCarbonFileFormat/WriterOutput/") +.getCanonicalPath + //getCanonicalPath gives path with \, but the code expects /. + writerPath = writerPath.replace("\\", "/"); + + def buildTestData(): Any = { + +FileUtils.deleteDirectory(new File(writerPath)) + +val schema = new StringBuilder() + .append("[ \n") + .append(" {\"name\":\"string\"},\n") + .append(" {\"age\":\"int\"},\n") + .append(" {\"height\":\"double\"}\n") + .append("]") + .toString() + +try { + val builder = CarbonWriter.builder() + val writer = + builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build() + var i = 0 + while (i < 100) { +writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2))) +i += 1 + } + writer.close() +} catch { + case _: Throwable => None +} + } + + def cleanTestData() = { +FileUtils.deleteDirectory(new File(writerPath)) + } + + def deleteIndexFile(path: String, extension: String) : Unit = { +val file: CarbonFile = FileFactory + .getCarbonFile(path, FileFactory.getFileType(path)) + +for (eachDir <- file.listFiles) { + if (!eachDir.isDirectory) { +if (eachDir.getName.endsWith(extension)) { + CarbonUtil.deleteFoldersAndFilesSilent(eachDir) +} + } else { +deleteIndexFile(eachDir.getPath, extension) + } +} + } + + test("Running SQL directly and read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ")
[GitHub] carbondata issue #2956: [CARBONDATA-3134] fixed null values when cachelevel ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2956 @manishgupta88 Please review ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r236647791 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/SparkCarbonDataSourceTestCase.scala --- @@ -0,0 +1,1267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.cluster.sdv.generated.dli + + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} + + + +import scala.collection.mutable + +import org.apache.avro +import org.apache.avro.file.DataFileWriter +import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericRecord} +import org.apache.avro.io.{DecoderFactory, Encoder} +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.common.util.QueryTest +import org.junit.Assert +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.datatype.{DataTypes, StructField} +import org.apache.carbondata.hadoop.testutil.StoreCreator +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} + +class SparkCarbonDataSourceTestCase extends QueryTest with BeforeAndAfterAll { + + val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath + + val warehouse1 = FileFactory.getPath(s"$rootPath/integration/spark-datasource/target/warehouse").toString + + test("test write using dataframe") { +import sqlContext.implicits._ +val df = sqlContext.sparkContext.parallelize(1 to 10) + .map(x => ("a" + x % 10, "b", x)) + .toDF("c1", "c2", "number") +sql("drop table if exists testformat") +// Saves dataframe to carbon file +df.write + .format("carbon").saveAsTable("testformat") +assert(sql("select * from testformat").count() == 10) +assert(sql("select * from testformat where c1='a0'").count() == 1) +assert(sql("select * from testformat").count() == 10) +sql("drop table if exists testformat") + } + + test("test write using ddl") { +import sqlContext.implicits._ +val df = sqlContext.sparkContext.parallelize(1 to 10) + .map(x => ("a" + x % 10, "b", x)) + .toDF("c1", "c2", "number") +sql("drop table if exists testparquet") +sql("drop table if exists testformat") +// Saves dataframe to carbon file +df.write + .format("parquet").saveAsTable("testparquet") +sql("create table carbon_table(c1 string, c2 string, number int) using carbon") +sql("insert into carbon_table select * from testparquet") +checkAnswer(sql("select * from carbon_table where c1='a1'"), sql("select * from testparquet where c1='a1'")) +if (!sqlContext.sparkContext.version.startsWith("2.1")) { + val mapSize = DataMapStoreManager.getInstance().getAllDataMaps.size() + DataMapStoreManager.getInstance() +.clearDataMaps(AbsoluteTableIdentifier.from(warehouse1 + "/carbon_table")) + assert(mapSize >= DataMapStoreManager.getInstance().getAllDataMaps.size()) +} +sql("drop table if exists testparquet") +sql("drop table if exists testformat") + } + + test("test read with df write") { + FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(warehouse1 + "/
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r236647192 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/SparkCarbonDataSourceTestCase.scala --- @@ -0,0 +1,1267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.cluster.sdv.generated.dli + + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, File, InputStream} + --- End diff -- Remove extra lines ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r236646305 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,484 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.dli + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} + +import scala.collection.JavaConverters._ + +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + --- End diff -- Dont leave blank lines ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r236646395 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,484 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.dli + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} + +import scala.collection.JavaConverters._ + +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +CarbonProperties.getInstance() --- End diff -- Remove this line ---
[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2951#discussion_r236646951 --- Diff: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala --- @@ -0,0 +1,484 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.cluster.sdv.generated.dli + +import java.io.File +import java.text.SimpleDateFormat +import java.util.{Date, Random} + +import scala.collection.JavaConverters._ + +import org.apache.commons.io.FileUtils +import org.apache.commons.lang.RandomStringUtils +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.util.SparkUtil + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.datatype.DataTypes +import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverter} +import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.spark.sql.Row +import org.apache.spark.sql.common.util.QueryTest + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter + +class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with BeforeAndAfterAll { + + + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + override def afterAll(): Unit = { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_MINMAX_ALLOWED_BYTE_COUNT, +CarbonCommonConstants.CARBON_MINMAX_ALLOWED_BYTE_COUNT_DEFAULT) +sql("DROP TABLE IF EXISTS sdkOutputTable") + } + + var writerPath = new File(this.getClass.getResource("/").getPath ++ +"../." + + "./src/test/resources/SparkCarbonFileFormat/WriterOutput/") +.getCanonicalPath + //getCanonicalPath gives path with \, but the code expects /. + writerPath = writerPath.replace("\\", "/"); + + def buildTestData(): Any = { + +FileUtils.deleteDirectory(new File(writerPath)) + +val schema = new StringBuilder() + .append("[ \n") + .append(" {\"name\":\"string\"},\n") + .append(" {\"age\":\"int\"},\n") + .append(" {\"height\":\"double\"}\n") + .append("]") + .toString() + +try { + val builder = CarbonWriter.builder() + val writer = + builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build() + var i = 0 + while (i < 100) { +writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2))) +i += 1 + } + writer.close() +} catch { + case _: Throwable => None +} + } + + def cleanTestData() = { +FileUtils.deleteDirectory(new File(writerPath)) + } + + def deleteIndexFile(path: String, extension: String) : Unit = { +val file: CarbonFile = FileFactory + .getCarbonFile(path, FileFactory.getFileType(path)) + +for (eachDir <- file.listFiles) { + if (!eachDir.isDirectory) { +if (eachDir.getName.endsWith(extension)) { + CarbonUtil.deleteFoldersAndFilesSilent(eachDir) +} + } else {
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2956#discussion_r236768667 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -332,13 +334,42 @@ public void clear() { } SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other = (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj; - return tableIdentifier.equals(other.tableIdentifier) && columnsInTable - .equals(other.columnsInTable) && Arrays + return tableIdentifier.equals(other.tableIdentifier) && checkColumnSchemaEquality( + columnsInTable, other.columnsInTable) && Arrays .equals(columnCardinality, other.columnCardinality); } +private boolean checkColumnSchemaEquality(List obj1, List obj2) { + List clonedObj1 = new ArrayList<>(obj1); --- End diff -- done ---
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2956#discussion_r236768636 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -332,13 +334,42 @@ public void clear() { } SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other = (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj; - return tableIdentifier.equals(other.tableIdentifier) && columnsInTable - .equals(other.columnsInTable) && Arrays + return tableIdentifier.equals(other.tableIdentifier) && checkColumnSchemaEquality( + columnsInTable, other.columnsInTable) && Arrays .equals(columnCardinality, other.columnCardinality); } +private boolean checkColumnSchemaEquality(List obj1, List obj2) { + List clonedObj1 = new ArrayList<>(obj1); + List clonedObj2 = new ArrayList<>(obj2); + clonedObj1.addAll(obj1); + clonedObj2.addAll(obj2); + Collections.sort(clonedObj1, new Comparator() { +@Override public int compare(ColumnSchema o1, ColumnSchema o2) { + return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId()); +} + }); + Collections.sort(clonedObj2, new Comparator() { --- End diff -- done ---
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2956#discussion_r236768643 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java --- @@ -332,13 +334,42 @@ public void clear() { } SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other = (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj; - return tableIdentifier.equals(other.tableIdentifier) && columnsInTable - .equals(other.columnsInTable) && Arrays + return tableIdentifier.equals(other.tableIdentifier) && checkColumnSchemaEquality( + columnsInTable, other.columnsInTable) && Arrays .equals(columnCardinality, other.columnCardinality); } +private boolean checkColumnSchemaEquality(List obj1, List obj2) { + List clonedObj1 = new ArrayList<>(obj1); + List clonedObj2 = new ArrayList<>(obj2); + clonedObj1.addAll(obj1); + clonedObj2.addAll(obj2); + Collections.sort(clonedObj1, new Comparator() { +@Override public int compare(ColumnSchema o1, ColumnSchema o2) { + return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId()); +} + }); + Collections.sort(clonedObj2, new Comparator() { +@Override public int compare(ColumnSchema o1, ColumnSchema o2) { + return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId()); +} + }); + boolean exists = true; + for (int i = 0; i < obj1.size(); i++) { +if (!clonedObj1.get(i).equalsWithColumnId(clonedObj2.get(i))) { + exists = false; + break; +} + } + return exists; +} + @Override public int hashCode() { - return tableIdentifier.hashCode() + columnsInTable.hashCode() + Arrays + int hashCode = 0; + for (ColumnSchema columnSchema: columnsInTable) { +hashCode += columnSchema.hashCodeWithColumnId(); --- End diff -- done ---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2942 LGTM ---
[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2956 [CARBONDATA-3134] fixed null values when cachelevel is set as blocklet **Problem:** For each blocklet an object of SegmentPropertiesAndSchemaHolder is created to store the schema used for query. This object is created only if no other blocklet has the same schema. To check the schema we are comparing List, as the equals method in ColumnSchema does not check for columnUniqueId therefore this check is failing and the new restructured blocklet is using the schema of the old blocklet. Due to this the newly added column is being ignored as the old blocklet schema specifies that the column is delete(alter drop). **Solution:** Instead of checking the equality through equals and hashcode, write a new implementation for both and check based on columnUniqueId. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3134 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2956 commit e74b3f22d285f5b5c37b17a11248b5e7977326b2 Author: kunal642 Date: 2018-11-27T08:43:27Z [CARBONDATA-3134] fixed null values when cachelevel is set as blocklet Problem: For each blocklet an object of SegmentPropertiesAndSchemaHolder is created to store the schema used for query. This object is created only if no other blocklet has the same schema. To check the schema we are comparing List, as the equals method in ColumnSchema does not check for columnUniqueId therefore this check is failing and the new restructured blocklet is using the schema of the old blocklet. Due to this the newly added column is being ignored as the old blocklet schema specifies that the column is delete(alter drop). Solution: Instead of checking the equality through equals and hashcode, write a new implementation for both and check based on columnUniqueId. ---
[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2945#discussion_r236531750 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -66,9 +63,6 @@ this.readers = readers; this.index = 0; this.currentReader = readers.get(0); -CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo(); --- End diff -- @xubo245 i have update the JIRA. Please check ---
[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2945#discussion_r235892400 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -66,9 +63,6 @@ this.readers = readers; this.index = 0; this.currentReader = readers.get(0); -CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo(); --- End diff -- ok ---
[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2945#discussion_r235891858 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -66,9 +63,6 @@ this.readers = readers; this.index = 0; this.currentReader = readers.get(0); -CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo(); --- End diff -- The same issue would be there in CSDK as well as this would be called internally. ---
[GitHub] carbondata issue #2945: [CARBONDATA-3123] Fixed JVM crash issue with CarbonR...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2945 retest this please ---
[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2945 [CARBONDATA-3123] Fixed JVM crash issue with CarbonRecordReader(SDK Reader). **Problem:** As CarbonReaderBuilder is executed on the main thread therefore while Reader creation we are setting TaskId to threadlocal. When multiple readers are created using the split API then the TaskID for the last initialized reader would be overridden and all the readers will use the same TaskID. Due to this when one reader is reading and the other reader is freeing memory after its task completion the same memory block would be cleared and read at the same time causing SIGSEGV error. **Solution:** Do not set TaskID to thread local while Reader Initialization. ThreadLocalTaskInfo.getCarbonTaskInfo will take care of assigning new TaskID if not already present. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3123 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2945.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2945 commit cce1811fa518545b6b31246efd5668048e7d24ea Author: kunal642 Date: 2018-11-23T05:41:44Z [CARBONDATA-3123] Fixed JVM crash issue with CarbonRecordReader(SDK Reader). **Problem:** As CarbonReaderBuilder is executed on the main thread therefore while Reader creation we are setting TaskId to threadlocal. When multiple readers are created using the split API then the TaskID for the last initialized reader would be overridden and all the readers will use the same TaskID. Due to this when one reader is reading and the other reader is freeing memory after its task completion the same memory block would be cleared and read at the same time causing SIGSEGV error. **Solution:** Do not set TaskID to thread local while Reader Initialization. ThreadLocalTaskInfo.getCarbonTaskInfo will take care of assigning new TaskID if not already present. ---
[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2923#discussion_r235681603 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { +sql("drop table if exists par") +sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") +sql("alter table par drop columns(name)") +sql("alter table par add columns(name string)") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") --- End diff -- @ravipesala Spark-2.1 and 2.2 both put partition column at the last even if a new column is added. ---
[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2929 LGTM ---
[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2929 @xubo245 Please resolve the conflicts ---
[GitHub] carbondata issue #2935: [HOTFIX] Intializing the CSDK object reference to NU...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2935 LGTM ---
[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2921 LGTM ---
[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2923#discussion_r234966352 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala --- @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and sql("drop datamap if exists preaggTable on table partitionTable") } + test("validate data in partition table after dropping and adding a column") { +sql("drop table if exists par") +sql("create table par(name string) partitioned by (age double) stored by " + + "'carbondata'") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") +sql("alter table par drop columns(name)") +sql("alter table par add columns(name string)") +sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + +s"('header'='false')") --- End diff -- Actually spark expects the partition column to be the last. So the solution that you are proposing will not work incase of datasource because spark will parse the schema and would always add the partition column to the last ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r234962295 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala --- @@ -48,8 +48,13 @@ object CarbonThriftServer { System.exit(0) } +val master = Option(System.getProperty("spark.master")) --- End diff -- ok ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r234962085 --- Diff: integration/spark2/pom.xml --- @@ -134,6 +134,11 @@ + --- End diff -- But nobody else is facing this issue. And also none of our environments are getting this exception. Maybe u need to add httpclient jars to you cluster seperately? ---
[GitHub] carbondata pull request #2921: [CARBONDATA-3104] Removed unnecessary configu...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2921#discussion_r234860057 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDropTableCommand.scala --- @@ -143,7 +143,6 @@ case class CarbonDropTableCommand( OperationListenerBus.getInstance.fireEvent(dropTablePostEvent, operationContext) } catch { case ex: NoSuchTableException => -LOGGER.error(ex.getLocalizedMessage, ex) --- End diff -- Logging this event is necessary. Instead of printing the whole exception stack we can just print the message as a warning. ---
[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2921 retest this please ---
[GitHub] carbondata issue #2924: [CARBONDATA-3065]Correct the error message for inver...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2924 LGTM ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r234191527 --- Diff: integration/spark2/pom.xml --- @@ -134,6 +134,11 @@ + --- End diff -- Why is this dependency needed in spark2? ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r234191597 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala --- @@ -48,8 +48,13 @@ object CarbonThriftServer { System.exit(0) } +val master = Option(System.getProperty("spark.master")) --- End diff -- Was this the reason for NoClassDefFoundError? ---
[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2925 @xubo245 I what scenario are you facing this issue? I am not facing this issue. ---
[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2921 retest this please ---
[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2923 @ravipesala @manishgupta88 Please review ---
[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2925 Please add description ---
[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2922 retest this please ---
[GitHub] carbondata pull request #2924: [CARBONDATA-3065]Correct the error message fo...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2924#discussion_r234093857 --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala --- @@ -1345,6 +1345,15 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll { spark.sql("drop table if exists fileformat_drop_hive") } + test("validate the columns not present in schema") { +spark.sql("drop table if exists validate") --- End diff -- Need to drop table after test case completion. better to add in afterAll() too ---
[GitHub] carbondata issue #2922: [WIP]s3 lock file fix
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2922 @jackylk No this is for ArrayIndexOutOfBoundsException when taking lock on S3. 3103 seems to be a different issue. ---
[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2902 ---
[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2923 retest this please ---
[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2923 [WIP] added partition columns to the last when collecting columns Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata partition_alter_bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2923 commit c399910c682f2573827214b1f76aea0225d7ecf8 Author: kunal642 Date: 2018-11-15T04:40:00Z added partition columns to the last when collecting columns ---
[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2903 LGTM ---
[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2903 LGTM ---
[GitHub] carbondata pull request #2903: [CARBONDATA-3084]dataload failure fix when fl...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2903#discussion_r231038773 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java --- @@ -325,32 +325,42 @@ static ColumnPageCodec selectCodecByAlgorithmForFloating(SimpleStatsResult stats //Here we should use the Max abs as max to getDatatype, let's say -1 and -1000, -1 is max, //but we can't use -1 to getDatatype, we should use -1000. double absMaxValue = Math.max(Math.abs(maxValue), Math.abs(minValue)); -if (decimalCount == 0) { +if (decimalCount == 0 && srcDataType == DataTypes.FLOAT) { --- End diff -- no need to check for decimalCount == 0. If dataType is FLOAT call getColumnPageCodec(..) ---
[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2901 retest this please ---
[GitHub] carbondata pull request #2902: [WIP] Fixed data mismatch issue after update
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2902 [WIP] Fixed data mismatch issue after update Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata update_data_mismatch_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2902 commit bbd3dc87ac84c1d4005379dd445dec30f31f24aa Author: kunal642 Date: 2018-11-06T05:21:00Z fixed data mismatch issue after update ---
[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2901#discussion_r231001835 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java --- @@ -171,13 +171,20 @@ public Object getCurrentValue() throws IOException, InterruptedException { rowCount += 1; Object[] row = new Object[carbonColumnarBatch.columnVectors.length]; for (int i = 0; i < carbonColumnarBatch.columnVectors.length; i ++) { + Object data = carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); if (carbonColumnarBatch.columnVectors[i].getType() == DataTypes.STRING || carbonColumnarBatch.columnVectors[i].getType() == DataTypes.VARCHAR) { -byte[] data = (byte[]) carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); -row[i] = ByteUtil.toString(data, 0, data.length); +if (data == null) { + row[i] = null; +} else { + row[i] = ByteUtil.toString((byte[]) data, 0, (((byte[]) data).length)); +} } else if (carbonColumnarBatch.columnVectors[i].getType() == DataTypes.BOOLEAN) { -byte data = (byte) carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1); -row[i] = ByteUtil.toBoolean(data); +if (data == null) { + row[i] = null; +} else { + row[i] = ByteUtil.toBoolean((byte) data); +} --- End diff -- getData is already has a check for null values. Because here explicit conversion is required therefore null check had to be added. ---
[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2901#discussion_r231001760 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1844,4 +1844,53 @@ public void testVectorReader() { } } + @Test + public void testReadingNullValues() { +String path = "./testWriteFiles"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[2]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.BOOLEAN); --- End diff -- done ---
[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2901 retest this please ---
[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2901 [CARBONDATA-3081] Fixed NPE for boolean type column with null value Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata npe_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2901 commit 906eee74f20fbc1a9983db1ce0073356649a7724 Author: kunal642 Date: 2018-11-05T13:16:44Z fixed NPE for boolean type column with null value ---
[GitHub] carbondata issue #2850: [CARBONDATA-3056] Added concurrent reading through S...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2850 LGTM ---
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2877 LGTM ---
[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2850#discussion_r230315785 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -114,6 +115,57 @@ public static CarbonReaderBuilder builder(String tablePath) { return builder(tablePath, tableName); } + /** + * Breaks the list of CarbonRecordReader in CarbonReader into multiple + * CarbonReader objects, each iterating through some 'carbondata' files + * and return that list of CarbonReader objects + * + * If the no. of files is greater than maxSplits, then break the + * CarbonReader into maxSplits splits, with each split iterating + * through >= 1 file. + * + * If the no. of files is less than maxSplits, then return list of + * CarbonReader with size as the no. of files, with each CarbonReader + * iterating through exactly one file + * + * @param maxSplits: Int + * @return list of {@link CarbonReader} objects + */ + public List split(int maxSplits) throws IOException { +validateReader(); +if (maxSplits < 1) { + throw new RuntimeException( + this.getClass().getSimpleName() + ".split: maxSplits must be positive"); +} + +List carbonReaders = new ArrayList<>(); + +if (maxSplits < this.readers.size()) { --- End diff -- @ravipesala Let us add test cases in a separate PR. would it be okay? ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r230305381 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -158,14 +173,31 @@ public CarbonReaderBuilder withHadoopConf(String key, String value) { } try { - final List splits = - format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID())); - + List splits; + if (filterExpression == null) { +splits = format.getAllFileSplits(job); + } else { +splits = format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID())); + } List> readers = new ArrayList<>(splits.size()); for (InputSplit split : splits) { TaskAttemptContextImpl attempt = new TaskAttemptContextImpl(job.getConfiguration(), new TaskAttemptID()); -RecordReader reader = format.createRecordReader(split, attempt); +RecordReader reader; +QueryModel queryModel = format.createQueryModel(split, attempt); +boolean hasComplex = false; +for (ProjectionDimension projectionDimension : queryModel.getProjectionDimensions()) { + if (projectionDimension.getDimension().isComplex()) { --- End diff -- sure ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r230277096 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO } } + /** + * This method will list all the carbondata files in the table path and treat one carbondata + * file as one split. + */ + public List getAllFileSplits(JobContext job) throws IOException { --- End diff -- added a flag "filter_blocks" in jobConf ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r230277015 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -347,9 +347,7 @@ private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, columnPage.getNullBits()); } else if (vectorDataType == DataTypes.FLOAT) { float[] floatPage = columnPage.getFloatPage(); -for (int i = 0; i < pageSize; i++) { - vector.putFloats(0, pageSize, floatPage, 0); -} +vector.putFloats(0, pageSize, floatPage, 0); --- End diff -- removed ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r230277069 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO } } + /** + * This method will list all the carbondata files in the table path and treat one carbondata + * file as one split. + */ + public List getAllFileSplits(JobContext job) throws IOException { +List splits = new ArrayList<>(); +CarbonTable carbonTable = getOrCreateCarbonTable(job.getConfiguration()); +if (null == carbonTable) { + throw new IOException("Missing/Corrupt schema file for table."); +} +for (CarbonFile carbonFile : getAllCarbonDataFiles(carbonTable.getTablePath())) { + CarbonInputSplit split = + new CarbonInputSplit("null", new Path(carbonFile.getAbsolutePath()), 0, + carbonFile.getLength(), carbonFile.getLocations(), FileFormat.COLUMNAR_V3); + split.setVersion(ColumnarFormatVersion.V3); + BlockletDetailInfo info = new BlockletDetailInfo(); + split.setDetailInfo(info); + info.setBlockSize(carbonFile.getLength()); + // Read the footer offset and set. + FileReader reader = FileFactory --- End diff -- moved to CarbonVectorizedRecordReader.initialize() ---