[GitHub] carbondata pull request #3064: [CARBONDATA-3243] Updated DOC for No-Sort Com...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3064#discussion_r247096965 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1201,6 +1202,17 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } +// Validate SORT_SCOPE +if(options.exists(_._1.equalsIgnoreCase("SORT_SCOPE"))) { + val optionValue: String = options.get("sort_scope").get.head._2 + if (!CarbonUtil.isValidSortOption(optionValue)) { +throw new InvalidConfigurationException( + s"Passing invalid SORT_SCOPE '$optionValue', valid SORT_SCOPE are 'NO_SORT'," + + s" 'BATCH_SORT', 'LOCAL_SORT' and 'GLOBAL_SORT' ") + } + +} + --- End diff -- Remove empty lines and properly format the code. ---
[GitHub] carbondata pull request #3070: [CARBONDATA-3246]Fix sdk reader issue if batc...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/3070 [CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero and vectorRead False. This PR is to fix sdk reader issue when batch size is given as zero and vectorRead False. **Problem** SDK reader is failing if vectorRead is false and detail query batch size is given as 0.Compiler is giving stack overflow error after getting stuck in ChunkRowIterator.hasnext recurssion. **Solution** Since 0 is wrong batch size, we should take DETAIL_QUERY_BATCH_SIZE_DEFAULT as the batch size. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed?- No - [x] Any backward compatibility impacted? - No - [x] Document update required? - No - [x] Testing done added test case - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata batchSize_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3070.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3070 commit 4c002f80903076ebd7707fe7cf1384e45f823bbd Author: shardul-cr7 Date: 2019-01-11T10:40:27Z [CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245585973 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,22 +109,42 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +// If longStringColumn is not present in dmproperties then we take long_string_columns from +// the parent table. +var longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +val longStringColumnInParents = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, "").split(",").map(_.trim) +var varcharDatamapFields = "" +fieldRelationMap foreach (fields => { + val aggFunc = fields._2.aggregateFunction + if (aggFunc == "") { --- End diff -- Done! ---
[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/3045 retest this please ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245516216 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,7 +110,29 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) --- End diff -- This PR is for the scenario when user doesn't config the long_string_columns in dmprop then we take long_string_cols from the parent table. ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245516158 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,7 +110,29 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +// If longStringColumn is not present in dm properties then we take long_string_columns from +// the parent table. +var longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) --- End diff -- Fixed it. If the user doesn't configure long_string in dmproperties, we take long_string_cols from the parent table. ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245475409 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,22 +109,42 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +// If longStringColumn is not present in dmproperties then we take long_string_columns from +// the parent table. +var longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +val longStringColumnInParents = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, "").split(",").map(_.trim) +var varcharDatamapFields = "" +fieldRelationMap foreach (fields => { + val aggFunc = fields._2.aggregateFunction + if (aggFunc == "") { +val relationList = (fields._2.columnTableRelationList) +relationList.foreach(rel => { + rel.foreach(col => { +if (longStringColumnInParents.contains(col.parentColumnName)) { + varcharDatamapFields += col.parentColumnName + "," +} + }) +}) + } +}) +if (varcharDatamapFields.size != 0) { --- End diff -- Done! ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245475435 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,22 +109,42 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +// If longStringColumn is not present in dmproperties then we take long_string_columns from +// the parent table. +var longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +val longStringColumnInParents = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, "").split(",").map(_.trim) +var varcharDatamapFields = "" +fieldRelationMap foreach (fields => { + val aggFunc = fields._2.aggregateFunction + if (aggFunc == "") { +val relationList = (fields._2.columnTableRelationList) +relationList.foreach(rel => { + rel.foreach(col => { +if (longStringColumnInParents.contains(col.parentColumnName)) { + varcharDatamapFields += col.parentColumnName + "," +} + }) +}) + } +}) +if (varcharDatamapFields.size != 0) { + longStringColumn = Option(varcharDatamapFields.slice(0, varcharDatamapFields.length - 1)) +} if (longStringColumn != None) { val fieldNames = fields.map(_.column) - val newLongStringColumn = longStringColumn.get.split(",").map(_.trim).map{ colName => + val newLongStringColumn = longStringColumn.get.split(",").map(_.trim).map { colName => val newColName = parentTable.getTableName.toLowerCase() + "_" + colName if (!fieldNames.contains(newColName)) { throw new MalformedDataMapCommandException( CarbonCommonConstants.LONG_STRING_COLUMNS.toUpperCase() + ":" + colName - + " does not in datamap") ++ " does not in datamap") --- End diff -- Done! ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r245475427 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -110,22 +109,42 @@ case class PreAggregateTableHelper( // Datamap table name and columns are automatically added prefix with parent table name // in carbon. For convenient, users can type column names same as the ones in select statement // when config dmproperties, and here we update column names with prefix. -val longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +// If longStringColumn is not present in dmproperties then we take long_string_columns from +// the parent table. +var longStringColumn = tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS) +val longStringColumnInParents = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, "").split(",").map(_.trim) +var varcharDatamapFields = "" --- End diff -- Done! ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r244992105 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -126,6 +126,12 @@ case class PreAggregateTableHelper( newLongStringColumn.mkString(",")) } +//Add long_string_columns properties in child table from the parent. +tableProperties --- End diff -- Done. @kumarvishal09 please review. ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3045#discussion_r244992035 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala --- @@ -333,6 +333,36 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi sql(s"DROP DATAMAP IF EXISTS $datamapName ON TABLE $longStringTable") } + test("creating datamap with long string column selected and loading data should be success") { + +sql(s"drop table if exists $longStringTable") +val datamapName = "pre_agg_dm" +sql( + s""" + | CREATE TABLE if not exists $longStringTable( + | id INT, name STRING, description STRING, address STRING, note STRING + | ) STORED BY 'carbondata' + | TBLPROPERTIES('LONG_STRING_COLUMNS'='description, note', 'SORT_COLUMNS'='name') + |""".stripMargin) + +sql( + s""" + | CREATE DATAMAP $datamapName ON TABLE $longStringTable + | USING 'preaggregate' + | AS SELECT id,description,note,count(*) FROM $longStringTable + | GROUP BY id,description,note + |""". +stripMargin) + +sql( + s""" + | LOAD DATA LOCAL INPATH '$inputFile' INTO TABLE $longStringTable + | OPTIONS('header'='false') + """.stripMargin) + +sql(s"drop table if exists $longStringTable") --- End diff -- Added! ---
[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/3045 [CARBONDATA-3222]Fix dataload failure after creation of preaggregate datamap on main table with long_string_columns This PR is to Fix dataload failure after creation of preaggregate datamap on main table with long_string_columns. Dataload is gettling failed because child table properties are not getting modified according to the parent table for long_string_columns. This occurs only when long_string_columns is not specified in dmproperties for preaggregate datamap but the datamap was getting created and data load was failing. This PR is to avoid the dataload failure in this scenario. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done added a testcase - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata lsc_preagg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3045 commit ed588b32ff95a451782c98cae991efd6d148b5c3 Author: shardul-cr7 Date: 2019-01-02T09:17:34Z [CARBONDATA-3222]Fix dataload failure after creation of preaggregate datamap on main table with long_string_columns ---
[GitHub] carbondata issue #3020: [CARBONDATA-3195]Added validation for Inverted Index...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/3020 > I think you need describe this validation in the ddl-of-carbondata.md of Inverted Index Configuration part Done! ---
[GitHub] carbondata pull request #3020: [CARBONDATA-3195]Added validation for Inverte...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/3020 [CARBONDATA-3195]Added validation for Inverted INdex columns and added a test case in case of varchar Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata 21dec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3020.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3020 commit 6374825a94b055a1a9d196ae151a01e2c9d3805e Author: shardul-cr7 Date: 2018-12-24T07:21:16Z Added validation for Inverted INdex columns and added a test case in case of varchar ---
[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2986#discussion_r241376148 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala --- @@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - + ("Carbon Column Compressor ", tblProps --- End diff -- Changed to "Data File Compressor" As this is in table properties we are displaying default value.It's same for other properties also, ---
[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2986 [CARBONDATA-3166]Updated Document and added Column Compressor used in Describe Forma⦠â¦tted Command Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata DocUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2986.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2986 commit 71cf4f0294adaaf2520d55fbb8b048494853896a Author: shardul-cr7 Date: 2018-12-13T08:42:18Z []Updated Document and added column compressor used in Describe Formatted Command ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240247558 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with """.stripMargin) } - test("test data loading with snappy compressor and offheap") { + test("test data loading with different compressors and offheap") { +for(comp <- compressors){ + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") --- End diff -- By default for gzip/zstd, it's false. So UT for this scenario is not required. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240236819 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); +} + +return bot.toByteArray(); --- End diff -- Similar to ByteArrayOutputStream.close() reason mentioned above. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240236269 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); +} + +return bot.toByteArray(); + } + + @Override public byte[] compressByte(byte[] unCompInput) { +return compressData(unCompInput); + } + + @Override public byte[] compressByte(byte[] unCompInput, int byteSize) { +return compressData(unCompInput); + } + + @Override public byte[] unCompressByte(byte[] compInput) { +return decompressData(compInput); + } + + @Override public byte[] unCompressByte(byte[] compInput, int offset, int length) { +byte[] data = new byte[length]; +System.arraycopy(compInput, offset, data, 0, length); +return decompressData(data); + } + + @Override public byte[] compressShort(short[] unCompInput) { +ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_SHORT); +unCompBuffer.asShortBuffer().put(unCompInput); +return compressData(unCompBuffer.array()); + } + + @Override public short[] unCompressShort(byte[] compInput, int offset, int length) { +byte[] unCompArray = unCompressByte(compInput, offset, length); +ShortBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asShortBuffer(); +short[] shorts = new short[unCompArray.length / ByteUtil.SIZEOF_SHORT]; +unCompBuffer.get(shorts); +return shorts; + } + + @Override public byte[] compressInt(int[] unCompInput) { +ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_INT); +unCompBuffer.asIntBuffer().put(unCompInput); +return compressData(unCompBuffer.array()); + } + + @Override p
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240236381 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); --- End diff -- Done. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240236462 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); --- End diff -- Done. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240227006 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); --- End diff -- Based on the observations I have initialized the byteArrayOutputStream with size of half of byte buffer, So it reduces the number of resizing of the stream. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240157144 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +/** + * Below api will write bytes from specified byte array to the gzipCompressorOutputStream + * The output stream will compress the given byte array. + */ +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); --- End diff -- ok added the actual exception. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240135884 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java --- @@ -35,8 +35,8 @@ private final Map allSupportedCompressors = new HashMap<>(); public enum NativeSupportedCompressor { -SNAPPY("snappy", SnappyCompressor.class), -ZSTD("zstd", ZstdCompressor.class); +SNAPPY("snappy", SnappyCompressor.class), ZSTD("zstd", ZstdCompressor.class), GZIP("gzip", --- End diff -- Done. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240130514 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +/** + * Below api will write bytes from specified byte array to the gzipCompressorOutputStream + * The output stream will compress the given byte array. + */ +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); + } finally { +gzipCompressorOutputStream.close(); + } +} catch (IOException e) { + throw new RuntimeException("Error during Compression step " + e.getMessage()); +} +return byteArrayOutputStream.toByteArray(); + } + + /** + * This method takes the Byte Array data and Deompresses in gzip format + * + * @param data Data Byte Array for Compression + * @param offset Start value of Data Byte Array + * @param length Size of Byte Array + * @return + */ + private byte[] decompressData(byte[] data, int offset, int length) { +ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data, offset, length); +ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorInputStream gzipCompressorInputStream = + new GzipCompressorInputStream(byteArrayOutputStream); + byte[] buffer = new byte[1024]; + int len; + /** + * Reads the next byte of the data from the input stream and stores them into buffer + * Data is then read from the buffer and put into byteOutputStream from a offset. + */ + while ((len = gzipCompressorInputStream.read(buffer)) != -1) { +byteOutputStream.write(buffer, 0, len); + } +} catch (IOException e) { + throw new RuntimeException("Error during Decompression step " + e.getMessage()); +} +return byteOutputStream.toByteArray(); + } + + @Override public byte[] compressByte(byte[] unCompInput) { +return compressData(unCompInput); + } + + @Override public byte[] compressByte(byte[] unCompInput, int byteSize) { +return compressData(unCompInput); + } + + @Override public byte[] unCompressByte(byte[] compInput) { +return decompressData(compInput, 0, compInput.length); + } + + @Override public byte[] unCompressByte(byte[] compInput, int offset, int length) { +return decompressData(compInput, offset, length); + } + + @Override public long rawUncompress(by
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240130469 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +/** + * Below api will write bytes from specified byte array to the gzipCompressorOutputStream + * The output stream will compress the given byte array. + */ +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); + } finally { +gzipCompressorOutputStream.close(); + } +} catch (IOException e) { + throw new RuntimeException("Error during Compression step " + e.getMessage()); +} +return byteArrayOutputStream.toByteArray(); + } + + /** + * This method takes the Byte Array data and Deompresses in gzip format + * + * @param data Data Byte Array for Compression + * @param offset Start value of Data Byte Array + * @param length Size of Byte Array + * @return + */ + private byte[] decompressData(byte[] data, int offset, int length) { +ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data, offset, length); +ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorInputStream gzipCompressorInputStream = + new GzipCompressorInputStream(byteArrayOutputStream); + byte[] buffer = new byte[1024]; + int len; + /** + * Reads the next byte of the data from the input stream and stores them into buffer + * Data is then read from the buffer and put into byteOutputStream from a offset. + */ + while ((len = gzipCompressorInputStream.read(buffer)) != -1) { +byteOutputStream.write(buffer, 0, len); + } +} catch (IOException e) { + throw new RuntimeException("Error during Decompression step " + e.getMessage()); +} +return byteOutputStream.toByteArray(); + } + + @Override public byte[] compressByte(byte[] unCompInput) { +return compressData(unCompInput); + } + + @Override public byte[] compressByte(byte[] unCompInput, int byteSize) { +return compressData(unCompInput); + } + + @Override public byte[] unCompressByte(byte[] compInput) { +return decompressData(compInput, 0, compInput.length); + } + + @Override public byte[] unCompressByte(byte[] compInput, int offset, int length) { +return decompressData(compInput, offset, length); + } + + @Override public long rawUncompress(by
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240130373 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + public GzipCompressor() { --- End diff -- Removed. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240102212 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -168,6 +168,7 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with private val tableName = "load_test_with_compressor" private var executorService: ExecutorService = _ private val csvDataDir = s"$integrationPath/spark2/target/csv_load_compression" + private val compressors = Array("snappy","zstd","gzip") --- End diff -- No test cases were removed. Just changed the test case name of "test with snappy and offheap" was changed to "test different compressors and offheap". ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r238971669 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and --- End diff -- done! ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r238971644 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); + } finally { +gzipCompressorOutputStream.close(); + } +} catch (IOException e) { + throw new RuntimeException("Error during Compression step " + e.getMessage()); +} + +return byteArrayOutputStream.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data); +ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); + --- End diff -- done! ---
[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2948#discussion_r237000628 --- Diff: docs/faq.md --- @@ -216,20 +216,18 @@ TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) ## How to check LRU cache memory footprint? To observe the LRU cache memory footprint in the logs, configure the below properties in log4j.properties file. ``` -log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG ``` -These properties will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will degrade the query performance. +This properties will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will degrade the query performance. Ensure carbon.max.driver.lru.cache.size is configured to observe the current cache size. --- End diff -- Done! ---
[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2948#discussion_r236998844 --- Diff: docs/faq.md --- @@ -216,20 +216,18 @@ TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) ## How to check LRU cache memory footprint? To observe the LRU cache memory footprint in the logs, configure the below properties in log4j.properties file. ``` -log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG ``` -These properties will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will degrade the query performance. +This properties will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will degrade the query performance. Ensure carbon.max.driver.lru.cache.size is configured to observe the current cache size. --- End diff -- Actually we are now just showing one log for CarbonLRUCache.So *these* is is not correct usage. I'll change properties to property. ---
[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2948 [CARBONDATA-3124] Updated log message in UnsafeMemoryManager Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata 23-nov Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2948.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2948 commit c19d454f4dcf95b82754fe05c3eb2df06d98fe33 Author: shardul-cr7 Date: 2018-11-23T13:15:44Z [CARBONDATA-3124] Updated log message in UnsafeMemoryManager ---
[GitHub] carbondata issue #2850: [CARBONDATA-3056] Added concurrent reading through S...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2850 retest this please ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227658900 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); --- End diff -- ok will do that! ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227658842 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); --- End diff -- ByteArrayOutputStream.close() does nothing. It's implementation in java is like this: public void close() throws IOException { } I can close it but I'll have to copy the stream to byte Array and return that byte array which can be a costly operation. ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2847 [WIP]Support Gzip as column compressor Gzip compressed file size is less than that of snappy but takes more time. Data generated by tpch-dbgen(lineitem) **Load Performance Comparisons (Compression)** *Test Case 1* *File Size 3.9G* *Records ~30M* | Codec Used | Load Time | File Size After Load | | -- | -- | -- | | Snappy | 156s | 101M | Zstd| 153s | 2.2M | Gzip| 163s | 12.1M *Test Case 2* *File Size 7.8G* *Records ~60M* | Codec Used | Load Time | File Size After Load | | -- | -- | -- | | Snappy | 336s | 203.6M | Zstd| 352s | 4.3M | Gzip| 354s | 12.1M **Query Performance (Decompression)** *Test Case 1* | Codec Used | Full Scan Time | -- | -- | Snappy | 16.108s | Zstd| 14.595s | Gzip| 14.313s *Test Case 2* | Codec Used | Full Scan Time | -- | -- | Snappy | 23.559s | Zstd| 23.913s | Gzip| 26.741s Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done added some testcases - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata b010 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2847 commit 6ad88ccc5663353d16372d91878d7efb223b16d6 Author: shardul-cr7 Date: 2018-10-23T11:57:47Z [WIP]Support Gzip ---
[GitHub] carbondata issue #2758: [CARBONDATA-2972] Debug Logs and function added for ...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2758 retest this please ---
[GitHub] carbondata issue #2760: [CARBONDATA-2968] Single pass load fails 2nd time in...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2760 retest this please ---
[GitHub] carbondata pull request #2760: [CARBONDATA-2968] Single pass load fails 2nd ...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2760 [CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due to port binding error. Problem : In secure cluster setup, single pass load is failing in spark-submit after using the beeline. Solution: It was happening because port was not getting updated and was not looking for the next empty port. port variable was not changing.So modified that part and added log to diplay the port number. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done manually - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata aftermycommit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2760 commit d2bd672aea1c2294f65f87a978be27e2d76d8c09 Author: shardul-cr7 Date: 2018-09-25T14:25:19Z [CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due to port binding error ---
[GitHub] carbondata issue #2747: [CARBONDATA-2960] SDK Reader fix with projection col...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2747 retest this please ---
[GitHub] carbondata pull request #2739: [CARBONDATA-2954]Fix error when create extern...
Github user shardul-cr7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2739#discussion_r219739583 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2226,7 +2226,11 @@ public static String getFilePathExternalFilePath(String path, Configuration conf if (dataFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT)) { return dataFile.getAbsolutePath(); } else if (dataFile.isDirectory()) { -return getFilePathExternalFilePath(dataFile.getAbsolutePath(), configuration); +if (getFilePathExternalFilePath(dataFile.getAbsolutePath(), configuration) == null) { --- End diff -- Handled the review comment and added the test cases also for the scenarios. ---
[GitHub] carbondata issue #2739: [CARBONDATA-2954]Fix error when create external tabl...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2739 retest this please ---
[GitHub] carbondata issue #2739: [CARBONDATA-2954]Fix error when create external tabl...
Github user shardul-cr7 commented on the issue: https://github.com/apache/carbondata/pull/2739 retest this please ---
[GitHub] carbondata pull request #2739: [CARBONDATA-2954]Fix error when create extern...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2739 [CARBONDATA-2954]Fix error when create external table command fired if path already exists Problem : Creating a external table and providing a valid location having some empty directory and .carbondata files was giving "operation not allowed: invalid datapath provided" error. Solution: It was happening because if the location was having some empty directory getFilePathExternalFilePath method in carbonutil.java was returning null due to the presence of empty directory.So made a slight modification to prevent this problem. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done manually tested. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata latestb05 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2739 commit 6e75500e5e4a081b5be69cbbde5e586e13e07d9f Author: shardul-cr7 Date: 2018-09-20T14:12:54Z [CARBONDATA-2954]Fix error when create external table command fired if path already exists ---
[GitHub] carbondata pull request #2714: [CARBONDATA-2875]Two different threads overwr...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2714 [CARBONDATA-2875]Two different threads overwriting the same carbondatafile. Problem :- During concurrent load through two different threads in a external table for non transactional tables , two different threads were overwriting the same carbondata file. Solution : This was happening because both the threads were assigning the same filename for the carbondata files so one was overwriting the other.This problem chances is reduced by changing the timestamp attached to the filename from millisecond to nanosecond. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Manually. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata concurrent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2714 commit a7fec5bbc8355aaa73f87af7152b3947b8fa9acd Author: shardul-cr7 Date: 2018-09-12T12:41:37Z concurent load ---
[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...
Github user shardul-cr7 closed the pull request at: https://github.com/apache/carbondata/pull/2710 ---
[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2710 [2875]two different threads overwriting the same carbondatafile Problem : Two different threads are overwriting the same carbondata file during creation of external table. Solution: Chances of two threads concurrently loading same carbondatafile is reduced by changing the timestamp attached in .carbondata file from millisecond to nanosecond.So chances of collision of different threads having the same file name is reduced. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Done Manually - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2710 commit 61520a3d0bacfbcbed5a2b5ac300f08cf9b36bb4 Author: shardul-cr7 Date: 2018-09-11T12:51:09Z chances of two different threads overwriting the same carbondatafile is reduced ---