[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/3056 @xuchuanyin Datasource table uses direct filling flow. As in direct flow there is no intermediate buffer so we are not using off-heap to store the page data in memory(filling all the records of a page to vector instead of filling batch wise). So in this case we can remove freeing of unsafe memory for Query as its not required. In case of stored by table, handling will be different as we support both batch wise filling and direct filling and for batch filling we are using unsafe, so we have to clear unsafe memory in this case. Here same handling is not required for data source table. Please refer https://github.com/apache/carbondata/pull/2591 for stored by handling of this issue. ---
[GitHub] carbondata pull request #3056: [CARBONDATA-3236] Fix for JVM Crash for inser...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3056 [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table Problem: Insert into new table from old table fails with JVM crash. This happened because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. Solution: As the flow for file format is direct flow and uses on-heap(safe) so no need to free the unsafe memory. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata JVMCrashForLoadAndQuery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3056 commit 150c710218ff3c09ccfccc9b2df970006964ef6d Author: manishnalla1994 Date: 2019-01-08T10:42:55Z Fix for JVM Crash for file format ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r245003004 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true) --- End diff -- As it is a metadata function, we are just computing it once and saving it while passing TRUE in 'calculateDataIndexSize' this function. So the value computed can be used afterwards also. ---
[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/3047 retest this please ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244957746 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) --- End diff -- Fixed. ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244957693 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -46,9 +47,9 @@ object CarbonStore { def showSegments( limit: Option[String], - tablePath: String, + carbonTable: CarbonTable, --- End diff -- Done. ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244911752 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,21 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize && null == load.getIndexSize) { + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false) + (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong, + dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong) +} else { + (load.getDataSize.toLong, --- End diff -- Yes, fixed it now. ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3047 [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize calculation for old store using Show Segments Problem: Table Created and Loading on older version(1.1) was showing data-size and index-size 0B when refreshed on new version. This was because when the data-size was coming as "null" we were not computing it, directly assigning 0 value to it. Solution: Computed the correct data-size and index-size using CarbonTable. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata Datasize0Issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3047 commit 6bf65d7a0b42e8d9a822fd234a510550bd8d2f17 Author: manishnalla1994 Date: 2019-01-02T12:30:36Z Fixed Wrong Datasize and Indexsize calculation for old store ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3022#discussion_r244118775 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -356,9 +359,20 @@ public static CarbonFactDataHandlerModel getCarbonFactDataHandlerModel(CarbonLoa .getColumnSchemaList(carbonTable.getDimensionByTableName(tableName), carbonTable.getMeasureByTableName(tableName)); carbonFactDataHandlerModel.setWrapperColumnSchema(wrapperColumnSchema); -// get the cardinality for all all the columns including no dictionary columns -int[] formattedCardinality = CarbonUtil - .getFormattedCardinality(segmentProperties.getDimColumnsCardinality(), wrapperColumnSchema); +// get the cardinality for all all the columns including no +// dictionary columns and complex columns +int[] dimAndComplexColumnCardinality = +new int[segmentProperties.getDimColumnsCardinality().length + segmentProperties +.getComplexDimColumnCardinality().length]; +for (int i = 0; i < segmentProperties.getDimColumnsCardinality().length; i++) { + dimAndComplexColumnCardinality[i] = segmentProperties.getDimColumnsCardinality()[i]; --- End diff -- The restructure case is not handled for complex types compaction. I have raised the JIRA issue and I will handle it. Please find the JIRA link : 'https://issues.apache.org/jira/browse/CARBONDATA-3203' . ---
[GitHub] carbondata pull request #3022: [CARBONDATA-3196] Fixed Compaction for Comple...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3022 [CARBONDATA-3196] Fixed Compaction for Complex types with Dictionary Include Problem: Compaction Failing for Complex datatypes with Dictionary Include as KeyGenenrator was not being set in model for Dictionary Include Complex Columns and dictionary include complex columns were not handled for finding cardinality. Solution: Handled both these issues by setting KeyGenerator and storing cardinality of Complex dictionary include columns. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata ComplexCompactionIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3022.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3022 commit 874bc3b894c08b76f770722de543615edb45df19 Author: manishnalla1994 Date: 2018-12-24T12:07:36Z Fixed Compaction for Complex types with Dictionary Include ---
[GitHub] carbondata pull request #3016: [CARBONDATA-3192] Fix for compaction compatib...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3016 [CARBONDATA-3192] Fix for compaction compatibilty issue Problem: Table Created, Loaded and Altered(Column added) in 1.5.1 version and Refreshed, Altered(Added Column dropped) , Loaded and Compacted with Varchar Columns in new version giving error. Solution: Corrected the Varchar Dimension index calculation by calculating it based on the columns which have been deleted (invisibleColumns). Hence giving the correct ordinals after deletion. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata CompactionCompatibilityFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3016.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3016 commit 08af7194b55f2dbf605284b09dd1c6886d56d7d7 Author: manishnalla1994 Date: 2018-12-21T13:41:46Z Fix for compaction compatibilty ---
[GitHub] carbondata pull request #3006: [CARBONDATA-3187] Supported Global Dictionary...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3006#discussion_r243187905 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -182,7 +182,7 @@ object GlobalDictionaryUtil { case None => None case Some(dim) => -if (DataTypes.isArrayType(dim.getDataType)) { +if (DataTypes.isArrayType(dim.getDataType) || DataTypes.isMapType(dim.getDataType)) { --- End diff -- Yes, Map is implemented as Array of Struct internally. ---
[GitHub] carbondata pull request #3006: [CARBONDATA-3187] Supported Global Dictionary...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3006 [CARBONDATA-3187] Supported Global Dictionary For Map Added the case for Global Dictionary to be created in case the datatype is Complex Map. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata GlobalDictForMap Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3006.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3006 commit ecac0c6621a7afb219fbc577f0e69f7e65609520 Author: manishnalla1994 Date: 2018-12-20T05:53:46Z Supported Global Dictionary For Map ---
[GitHub] carbondata pull request #3002: [CARBONDATA-3182] Fixed SDV Testcase failures...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3002 [CARBONDATA-3182] Fixed SDV Testcase failures Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata INClauseIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3002.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3002 commit 8ad7224667b60dd8c3d7056221359aa3d1cc80ed Author: manishnalla1994 Date: 2018-12-19T08:45:17Z Fixed SDV Testcases ---
[GitHub] carbondata pull request #2993: [CARBONDATA-3179] Map data load failure
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2993#discussion_r242448934 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala --- @@ -442,4 +441,45 @@ class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { "sort_columns is unsupported for map datatype column: mapfield")) } + test("Data Load Fail Issue") { +sql("DROP TABLE IF EXISTS carbon") +sql( + s""" + | CREATE TABLE carbon( + | mapField map + | ) + | STORED BY 'carbondata' + | """ +.stripMargin) +sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE carbon OPTIONS( + | 'header' = 'false') + """.stripMargin) +sql("INSERT INTO carbon SELECT * FROM carbon") +checkAnswer(sql("select * from carbon"), Seq( + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")) + )) + } + + test("Struct inside map") { +sql("DROP TABLE IF EXISTS carbon") --- End diff -- Done ---
[GitHub] carbondata pull request #2993: [CARBONDATA-3179] Map data load failure
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2993#discussion_r242448914 --- Diff: streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala --- @@ -66,30 +65,57 @@ object FieldConverter { case bs: Array[Byte] => new String(bs, Charset.forName(CarbonCommonConstants.DEFAULT_CHARSET)) case s: scala.collection.Seq[Any] => - val delimiter = if (level == 1) { -delimiterLevel1 - } else { -delimiterLevel2 - } + val delimiter = complexDelimiters.get((level)) --- End diff -- Done ---
[GitHub] carbondata pull request #2993: [WIP] Map data load failure
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2993 [WIP] Map data load failure Problem : Data Load failing for Insert into Select from same table in containing Map datatype. Solution: Map type was not handled for this scenario. Handled it now. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata MapDataLoadFailure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2993.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2993 commit 69010850cccfa4e82f1b70ea954454ff1af1a61a Author: manishnalla1994 Date: 2018-12-07T09:25:58Z Delimiters changed commit de2603041e2a123af1ae403289d2eed7f7c7c24a Author: manishnalla1994 Date: 2018-10-16T09:48:08Z MapDDLSupport commit e35126868e563971c11dcbe82e200adc476f7143 Author: manishnalla1994 Date: 2018-12-14T11:50:15Z Change of Function for all Delimiters ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900720 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala --- @@ -0,0 +1,452 @@ +/* + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +* +http://www.apache.org/licenses/LICENSE-2.0 +* +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ +package org.apache.carbondata.spark.testsuite.createTable.TestCreateDDLForComplexMapType + +import java.io.File +import java.util + +import org.apache.hadoop.conf.Configuration +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk + +class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { + private val conf: Configuration = new Configuration(false) + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + val path = s"$rootPath/examples/spark2/src/main/resources/maptest2.csv" + + private def checkForLocalDictionary(dimensionRawColumnChunks: util + .List[DimensionRawColumnChunk]): Boolean = { +var isLocalDictionaryGenerated = false +import scala.collection.JavaConversions._ +for (dimensionRawColumnChunk <- dimensionRawColumnChunks) { + if (dimensionRawColumnChunk.getDataChunkV3 +.isSetLocal_dictionary) { +isLocalDictionaryGenerated = true + } --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900751 --- Diff: examples/spark2/src/main/resources/maptest2.csv --- @@ -0,0 +1,2 @@ +1\002Nalla\0012\002Singh\0011\002Gupta\0014\002Kumar +10\002Nallaa\00120\002Sissngh\001100\002Gusspta\00140\002Kumar --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900767 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio SKIP_EMPTY_LINE, carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE))); -String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":"); +String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "003"); --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900728 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java --- @@ -338,12 +338,15 @@ public static CarbonLoadModel getLoadModel(Configuration conf) throws IOExceptio SKIP_EMPTY_LINE, carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE))); -String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":"); +String complexDelim = conf.get(COMPLEX_DELIMITERS, "$" + "," + ":" + "," + "003"); String[] split = complexDelim.split(","); model.setComplexDelimiterLevel1(split[0]); if (split.length > 1) { model.setComplexDelimiterLevel2(split[1]); } +if (split.length > 2) { + model.setComplexDelimiterLevel3(split[2]); +} --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900706 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -188,11 +188,13 @@ case class CarbonLoadDataCommand( val carbonLoadModel = new CarbonLoadModel() val tableProperties = table.getTableInfo.getFactTable.getTableProperties val optionsFinal = LoadOption.fillOptionWithDefaultValue(options.asJava) +// These two delimiters are non configurable and hardcoded for map type +optionsFinal.put("complex_delimiter_level_3", "\003") +optionsFinal.put("complex_delimiter_level_4", "\004") --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900568 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java --- @@ -34,8 +37,12 @@ private int numberOfColumns; public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) { -String[] complexDelimiters = +String[] tempComplexDelimiters = (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS); +Queue complexDelimiters = new LinkedList<>(); +for (int i = 0; i < 4; i++) { --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900612 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/CarbonParserFactory.java --- @@ -51,23 +54,37 @@ public static GenericParser createParser(CarbonColumn carbonColumn, String[] com * delimiters * @return GenericParser */ - private static GenericParser createParser(CarbonColumn carbonColumn, String[] complexDelimiters, + private static GenericParser createParser(CarbonColumn carbonColumn, + Queue complexDelimiters, String nullFormat, int depth) { +if (depth > 2) { + return null; --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900628 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -119,6 +119,10 @@ "complex_delimiter_level_2", Maps.getOrDefault(options, "complex_delimiter_level_2", ":")); +optionsFinal.put( +"complex_delimiter_level_3", +Maps.getOrDefault(options, "complex_delimiter_level_3", "003")); + --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900688 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java --- @@ -222,8 +222,8 @@ public static CarbonDataLoadConfiguration createConfiguration(CarbonLoadModel lo configuration.setSegmentId(loadModel.getSegmentId()); configuration.setTaskNo(loadModel.getTaskNo()); configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS, -new String[] { loadModel.getComplexDelimiterLevel1(), -loadModel.getComplexDelimiterLevel2() }); +new String[] { loadModel.getComplexDelimiterLevel1(), loadModel.getComplexDelimiterLevel2(), +loadModel.getComplexDelimiterLevel3(), loadModel.getComplexDelimiterLevel4() }); --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900666 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java --- @@ -631,7 +651,7 @@ public void setFactTimeStamp(long factTimeStamp) { } public String[] getDelimiters() { -return new String[] { complexDelimiterLevel1, complexDelimiterLevel2 }; +return new String[] { complexDelimiterLevel1, complexDelimiterLevel2, complexDelimiterLevel3 }; --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240900542 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java --- @@ -65,8 +65,7 @@ private String csvHeader; private String[] csvHeaderColumns; private String csvDelimiter; - private String complexDelimiterLevel1; - private String complexDelimiterLevel2; + private ArrayList complexDelimiters = new ArrayList<>(); --- End diff -- done ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r240591596 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java --- @@ -34,8 +37,12 @@ private int numberOfColumns; public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) { -String[] complexDelimiters = +String[] tempComplexDelimiters = (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS); +Queue complexDelimiters = new LinkedList<>(); +for (int i = 0; i < 4; i++) { --- End diff -- Done. ---
[GitHub] carbondata pull request #2821: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 closed the pull request at: https://github.com/apache/carbondata/pull/2821 ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2980 [CARBONDATA-3017] Map DDL Support Support Create DDL for Map type. This PR is dependant on PR#2979 for the change of delimiters. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata MapDDL5Dec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2980 commit 322b52a64c840317a6905664e8de16327e3635e0 Author: Manish Nalla Date: 2018-10-16T09:48:08Z MapDDLSupport commit 3d119888a80e7d8f9cab59e477984b56af1309f6 Author: manishnalla1994 Date: 2018-12-07T08:18:31Z Added Testcases and Local Dict Support commit 5fe06801360fc04bab9c1239ea8d007f37bc69d4 Author: manishnalla1994 Date: 2018-12-07T13:28:54Z Test Files for Map commit 4cc8ba13b234a13b9a3cef541e37f492153e7d1b Author: manishnalla1994 Date: 2018-12-07T14:44:12Z Changed TestCases and Supported 2 new delimiters ---
[GitHub] carbondata pull request #2979: [CARBONDATA-3153] Complex delimiters change
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2979 [CARBONDATA-3153] Complex delimiters change Changed the two Complex Delimiters used to '\001' and '\002'. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata ComplexDelimiters Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2979 commit 7cfa05fbf65b5b176fe94ce6c36e4deb10a2a437 Author: manishnalla1994 Date: 2018-12-07T09:25:58Z Delimiters changed commit bcf265316627f49862a994292bb37169afe40403 Author: manishnalla1994 Date: 2018-12-07T13:46:57Z Change of 2 complex delimiters ---
[GitHub] carbondata issue #2918: [CARBONDATA-3098] Fix for negative exponents value g...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2918 retest this please ---
[GitHub] carbondata pull request #2918: [CARBONDATA-3098] Fix for negative exponents ...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2918 [CARBONDATA-3098] Fix for negative exponents value giving wrong results in Float datatype Problem: When the value of exponent is a negative number then the data is incorrect due to loss of precision of Floating point values and wrong calculation of the count of decimal points. Solution: Handled floating point precision by converting it to double and counted the decimal count values as done in double datatype(using Big Decimal). Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata FloatInfiniteFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2918.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2918 commit 8d4ede90f5c47759485b34f1f20cec3bbdc32c15 Author: Manish Nalla Date: 2018-11-14T05:27:49Z Float negative exponents ---
[GitHub] carbondata issue #2848: [CARBONDATA-3036] Cache Columns And Refresh Table Is...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2848 retest this please ---
[GitHub] carbondata pull request #2848: [CARBONDATA-3036] Cache Columns And Refresh T...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2848 [CARBONDATA-3036] Cache Columns And Refresh Table Isuue Fix Refresh Table Issue : Refresh Table command acting in case sensitive manner. Cache Columns Issue : Results inconsistent when cache is set but min/max exceeds. Columns are dictionary excluded. Fix 1 : Path for carbon file was been taken as whatever table name given in the query(Lowercase/Uppercase). Changed it to lowercase. Fix 2 : MinMaxFlag array was not set according to the columns to be cached giving inconsistent results. Changed it according to the min/max values array for whatever columns given in Cache Columns only. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata RefreshAndCacheColumnsFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2848 commit 7158960c750cf6ed7243e1c7c4bbc44fe158326c Author: Manish Nalla Date: 2018-10-24T05:45:15Z CacheAndRefreshIsuueFix ---
[GitHub] carbondata issue #2821: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2821 retest this please ---
[GitHub] carbondata issue #2821: [CARBONDATA-3017] Map DDL Support
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2821 Retest this please ---
[GitHub] carbondata pull request #2821: [CARBONDATA-3017] Map DDL Support
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2821 [CARBONDATA-3017] Map DDL Support Support Create DDL for Map type. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata MapDDLSupport16Oct Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2821.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2821 commit 5d8b36dfe3465cfd40668d24ffc85f0556dc66b1 Author: Manish Nalla Date: 2018-10-16T09:48:08Z MapDDLSupport ---
[GitHub] carbondata issue #2758: [CARBONDATA-2972] Debug Logs and function added for ...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2758 retest this please ---
[GitHub] carbondata pull request #2758: [CARBONDATA-2972] Debug Logs and function add...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2758#discussion_r220553863 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoder.java --- @@ -78,6 +78,14 @@ public DataType getTargetDataType(ColumnPage inputPage) { } } + public Encoding getEncodingType() { +List currEncodingList = getEncodingList(); +if (CarbonUtil.isEncodedWithMeta(currEncodingList)) { + return currEncodingList.get(0); +} +return null; --- End diff -- This function will only return value in case of Adaptive Encoding, for any other case it will return null. Also we just want the first element of the list to check the type of encoding. ---
[GitHub] carbondata issue #2758: [CARBONDATA-2972] Debug Logs and function added for ...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2758 retest this please ---
[GitHub] carbondata issue #2758: [WIP] Debug Logs and function added for Adaptive Enc...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2758 retest this please ---
[GitHub] carbondata pull request #2758: Debug Logs and function added for Adaptive En...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2758 Debug Logs and function added for Adaptive Encoding Added a function to get the type of encoding used. Added the debug log for checking which type of encoding is used. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata adaptive_encoding_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2758 commit dc5e93252ec85fde6acc5d00992f41cdf030452a Author: Manish Nalla Date: 2018-09-25T12:14:49Z Debug Logs and function added for Adaptive Encoding ---
[GitHub] carbondata pull request #2747: [CARBONDATA-2960] SDK Reader fix with project...
Github user manishnalla1994 closed the pull request at: https://github.com/apache/carbondata/pull/2747 ---
[GitHub] carbondata pull request #2747: [CARBONDATA-2960] SDK Reader fix with project...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2747#discussion_r220062362 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -775,9 +775,18 @@ public static boolean getAccessStreamingSegments(Configuration configuration) { public String[] projectAllColumns(CarbonTable carbonTable) { List colList = carbonTable.getTableInfo().getFactTable().getListOfColumns(); List projectColumn = new ArrayList<>(); +int childDimCount = 0; for (ColumnSchema cols : colList) { if (cols.getSchemaOrdinal() != -1) { -projectColumn.add(cols.getColumnName()); +if (childDimCount == 0) { --- End diff -- added and updated ---
[GitHub] carbondata issue #2747: [CARBONDATA-2960] SDK Reader fix with projection col...
Github user manishnalla1994 commented on the issue: https://github.com/apache/carbondata/pull/2747 retest this please ---
[GitHub] carbondata pull request #2747: [CARBONDATA-2960] SDK Reader fix with project...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2747 [CARBONDATA-2960] SDK Reader fix with projection columns SDK Reader was not working when all projection columns were given. Added exception for Complex child projections too. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata reader_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2747.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2747 commit 1c458df5ca3c274a3028bc8263c9b2758588c4e6 Author: Manish Nalla Date: 2018-09-21T13:54:01Z SDK Reader with Default Projection commit b684ab22d42d749364de8a46317b4c43a2d67d2c Author: Manish Nalla Date: 2018-09-21T14:05:38Z SDK Reader with Default Projection(1) ---
[GitHub] carbondata pull request #2708: [CARBONDATA-2886] Select Filter Compatibility...
Github user manishnalla1994 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2708#discussion_r216908025 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java --- @@ -383,8 +383,9 @@ private void fillBlockletInfoToTableBlock(List tableBlockInfos, blockletInfo.getBlockletIndex().getMinMaxIndex().getMaxValues()); // update min and max values in case of old store for measures as min and max is written // opposite for measures in old store ( store <= 1.1 version) + byte[][] tempMaxValues = maxValues; maxValues = CarbonUtil.updateMinMaxValues(fileFooter, maxValues, minValues, false); - minValues = CarbonUtil.updateMinMaxValues(fileFooter, maxValues, minValues, true); + minValues = CarbonUtil.updateMinMaxValues(fileFooter, tempMaxValues, minValues, true); --- End diff -- The maxValues are changed in line 386 and we need the old values of maxValues to compute the correct minValues. So stored it in a temporary variable first. ---
[GitHub] carbondata pull request #2708: [CARBONDATA-2886] Select Filter Comppatibilit...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2708 [CARBONDATA-2886] Select Filter Comppatibility Fixed Problem : Select Filter Query with INT data type was showing incorrect result in case of table creation and loading on old version and queried on new version. The min max values in case of legacy table were not being updated properly so the check inside the blocklet was not happening. Solution : Correctly updated the min max values. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Manually tested. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata compatibilty_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2708 commit 234d159aea1ada889fec0424a8370f4564f90f6e Author: Manish Nalla Date: 2018-09-11T08:38:33Z Select Filter Comppatibility Fixed ---
[GitHub] carbondata pull request #2669: [Documentation] Added the missing links for o...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2669 [Documentation] Added the missing links for online documentation Added the missing links for the online documentation: > S3 Guide > Bloom filter Datamap > Lucene Datamap You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2669.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2669 commit 032da32603e8a1c15cf1d5ff48173e6a201a4d32 Author: manishnalla1994 <30823674+manishnalla1994@...> Date: 2018-08-29T10:27:23Z [Documentation] Added the missing links for online documentation Added the missing links for the online documentation: > S3 Guide > Bloom filter Datamap > Lucene datamap ---