[GitHub] carbondata pull request #3064: [CARBONDATA-3243] Updated DOC for No-Sort Com...

2019-01-11 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3064#discussion_r247096965
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1201,6 +1202,17 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+// Validate SORT_SCOPE
+if(options.exists(_._1.equalsIgnoreCase("SORT_SCOPE"))) {
+  val optionValue: String = options.get("sort_scope").get.head._2
+  if (!CarbonUtil.isValidSortOption(optionValue)) {
+throw new InvalidConfigurationException(
+  s"Passing invalid SORT_SCOPE '$optionValue', valid SORT_SCOPE 
are 'NO_SORT'," +
+  s" 'BATCH_SORT', 'LOCAL_SORT' and 'GLOBAL_SORT' ")
+  }
+
+}
+
--- End diff --



Remove empty lines and properly format the code.



---


[GitHub] carbondata pull request #3070: [CARBONDATA-3246]Fix sdk reader issue if batc...

2019-01-11 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/3070

[CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero and 
vectorRead False.

This PR is to fix sdk reader issue when batch size is given as zero and 
vectorRead False.

**Problem**  SDK reader is failing if vectorRead is false and detail query 
batch size is given as 0.Compiler is giving stack overflow error after getting 
stuck in ChunkRowIterator.hasnext recurssion.

**Solution**  Since 0 is wrong batch size, we should take 
DETAIL_QUERY_BATCH_SIZE_DEFAULT as the batch size.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?- No
 
 - [x] Any backward compatibility impacted? - No
 
 - [x] Document update required? - No


 - [x] Testing done
added test case 
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata batchSize_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3070.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3070


commit 4c002f80903076ebd7707fe7cf1384e45f823bbd
Author: shardul-cr7 
Date:   2019-01-11T10:40:27Z

[CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero




---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-07 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245585973
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,22 +109,42 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dmproperties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+val longStringColumnInParents = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, 
"").split(",").map(_.trim)
+var varcharDatamapFields = ""
+fieldRelationMap foreach (fields => {
+  val aggFunc = fields._2.aggregateFunction
+  if (aggFunc == "") {
--- End diff --

Done!


---


[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...

2019-01-06 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/3045
  
retest this please


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245516216
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

This PR is for the scenario when user doesn't config the 
long_string_columns in dmprop then we take long_string_cols from the parent 
table.


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-06 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245516158
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,7 +110,29 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dm properties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
--- End diff --

Fixed it. If the user doesn't configure long_string in dmproperties, we 
take long_string_cols from the parent table.


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-05 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245475409
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,22 +109,42 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dmproperties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+val longStringColumnInParents = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, 
"").split(",").map(_.trim)
+var varcharDatamapFields = ""
+fieldRelationMap foreach (fields => {
+  val aggFunc = fields._2.aggregateFunction
+  if (aggFunc == "") {
+val relationList = (fields._2.columnTableRelationList)
+relationList.foreach(rel => {
+  rel.foreach(col => {
+if (longStringColumnInParents.contains(col.parentColumnName)) {
+  varcharDatamapFields += col.parentColumnName + ","
+}
+  })
+})
+  }
+})
+if (varcharDatamapFields.size != 0) {
--- End diff --

Done!


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-05 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245475435
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,22 +109,42 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dmproperties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+val longStringColumnInParents = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, 
"").split(",").map(_.trim)
+var varcharDatamapFields = ""
+fieldRelationMap foreach (fields => {
+  val aggFunc = fields._2.aggregateFunction
+  if (aggFunc == "") {
+val relationList = (fields._2.columnTableRelationList)
+relationList.foreach(rel => {
+  rel.foreach(col => {
+if (longStringColumnInParents.contains(col.parentColumnName)) {
+  varcharDatamapFields += col.parentColumnName + ","
+}
+  })
+})
+  }
+})
+if (varcharDatamapFields.size != 0) {
+  longStringColumn = Option(varcharDatamapFields.slice(0, 
varcharDatamapFields.length - 1))
+}
 if (longStringColumn != None) {
   val fieldNames = fields.map(_.column)
-  val newLongStringColumn = 
longStringColumn.get.split(",").map(_.trim).map{ colName =>
+  val newLongStringColumn = 
longStringColumn.get.split(",").map(_.trim).map { colName =>
 val newColName = parentTable.getTableName.toLowerCase() + "_" + 
colName
 if (!fieldNames.contains(newColName)) {
   throw new MalformedDataMapCommandException(
 CarbonCommonConstants.LONG_STRING_COLUMNS.toUpperCase() + ":" 
+ colName
-  + " does not in datamap")
++ " does not in datamap")
--- End diff --

Done!


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-05 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r245475427
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -110,22 +109,42 @@ case class PreAggregateTableHelper(
 // Datamap table name and columns are automatically added prefix with 
parent table name
 // in carbon. For convenient, users can type column names same as the 
ones in select statement
 // when config dmproperties, and here we update column names with 
prefix.
-val longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+// If longStringColumn is not present in dmproperties then we take 
long_string_columns from
+// the parent table.
+var longStringColumn = 
tableProperties.get(CarbonCommonConstants.LONG_STRING_COLUMNS)
+val longStringColumnInParents = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LONG_STRING_COLUMNS, 
"").split(",").map(_.trim)
+var varcharDatamapFields = ""
--- End diff --

Done!


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-03 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r244992105
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -126,6 +126,12 @@ case class PreAggregateTableHelper(
 newLongStringColumn.mkString(","))
 }
 
+//Add long_string_columns properties in child table from the parent.
+tableProperties
--- End diff --

Done. @kumarvishal09 please review.


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-03 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3045#discussion_r244992035
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala
 ---
@@ -333,6 +333,36 @@ class VarcharDataTypesBasicTestCase extends QueryTest 
with BeforeAndAfterEach wi
 sql(s"DROP DATAMAP IF EXISTS $datamapName ON TABLE $longStringTable")
   }
 
+  test("creating datamap with long string column selected and loading data 
should be success") {
+
+sql(s"drop table if exists $longStringTable")
+val datamapName = "pre_agg_dm"
+sql(
+  s"""
+ | CREATE TABLE if not exists $longStringTable(
+ | id INT, name STRING, description STRING, address STRING, note 
STRING
+ | ) STORED BY 'carbondata'
+ | TBLPROPERTIES('LONG_STRING_COLUMNS'='description, note', 
'SORT_COLUMNS'='name')
+ |""".stripMargin)
+
+sql(
+  s"""
+ | CREATE DATAMAP $datamapName ON TABLE $longStringTable
+ | USING 'preaggregate'
+ | AS SELECT id,description,note,count(*) FROM $longStringTable
+ | GROUP BY id,description,note
+ |""".
+stripMargin)
+
+sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$inputFile' INTO TABLE $longStringTable
+ | OPTIONS('header'='false')
+   """.stripMargin)
+
+sql(s"drop table if exists $longStringTable")
--- End diff --

Added!


---


[GitHub] carbondata pull request #3045: [CARBONDATA-3222]Fix dataload failure after c...

2019-01-02 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/3045

[CARBONDATA-3222]Fix dataload failure after creation of preaggregate 
datamap on main table with long_string_columns

This PR is to Fix dataload failure after creation of preaggregate datamap 
on main table with long_string_columns.

Dataload is gettling failed because child table properties are not getting 
modified according to the parent table for long_string_columns.
This occurs only when long_string_columns is not specified in dmproperties 
for preaggregate datamap but the datamap was getting created and data load was 
failing. This PR is to avoid the dataload failure in this scenario.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
added a testcase
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata lsc_preagg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3045


commit ed588b32ff95a451782c98cae991efd6d148b5c3
Author: shardul-cr7 
Date:   2019-01-02T09:17:34Z

[CARBONDATA-3222]Fix dataload failure after creation of preaggregate 
datamap on main table with long_string_columns




---


[GitHub] carbondata issue #3020: [CARBONDATA-3195]Added validation for Inverted Index...

2018-12-26 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/3020
  



> I think you need describe this validation in the ddl-of-carbondata.md of 
Inverted Index Configuration part

Done!


---


[GitHub] carbondata pull request #3020: [CARBONDATA-3195]Added validation for Inverte...

2018-12-23 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/3020

[CARBONDATA-3195]Added validation for Inverted INdex columns and added a 
test case in case of varchar



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata 21dec

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3020


commit 6374825a94b055a1a9d196ae151a01e2c9d3805e
Author: shardul-cr7 
Date:   2018-12-24T07:21:16Z

Added validation for Inverted INdex columns and added a test case in case 
of varchar




---


[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...

2018-12-13 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2986#discussion_r241376148
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand(
 Strings.formatSize(
   
tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
 
CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""),
-
+  ("Carbon Column Compressor ", tblProps
--- End diff --

Changed to "Data File Compressor"
As this is in table properties we are displaying default value.It's same 
for other properties also,


---


[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...

2018-12-13 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2986

[CARBONDATA-3166]Updated Document and added Column Compressor used in 
Describe Forma…

…tted Command

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata DocUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2986.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2986


commit 71cf4f0294adaaf2520d55fbb8b048494853896a
Author: shardul-cr7 
Date:   2018-12-13T08:42:18Z

[]Updated Document and added column compressor used in Describe Formatted 
Command




---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240247558
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ---
@@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest 
with BeforeAndAfterEach with
""".stripMargin)
   }
 
-  test("test data loading with snappy compressor and offheap") {
+  test("test data loading with different compressors and offheap") {
+for(comp <- compressors){
+  
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
--- End diff --

By default for gzip/zstd, it's false. So UT for this scenario is not 
required.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236819
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bot.toByteArray();
--- End diff --

Similar to ByteArrayOutputStream.close() reason mentioned above.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236269
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bot.toByteArray();
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput, int byteSize) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput) {
+return decompressData(compInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput, int offset, int 
length) {
+byte[] data = new byte[length];
+System.arraycopy(compInput, offset, data, 0, length);
+return decompressData(data);
+  }
+
+  @Override public byte[] compressShort(short[] unCompInput) {
+ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * 
ByteUtil.SIZEOF_SHORT);
+unCompBuffer.asShortBuffer().put(unCompInput);
+return compressData(unCompBuffer.array());
+  }
+
+  @Override public short[] unCompressShort(byte[] compInput, int offset, 
int length) {
+byte[] unCompArray = unCompressByte(compInput, offset, length);
+ShortBuffer unCompBuffer = 
ByteBuffer.wrap(unCompArray).asShortBuffer();
+short[] shorts = new short[unCompArray.length / ByteUtil.SIZEOF_SHORT];
+unCompBuffer.get(shorts);
+return shorts;
+  }
+
+  @Override public byte[] compressInt(int[] unCompInput) {
+ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * 
ByteUtil.SIZEOF_INT);
+unCompBuffer.asIntBuffer().put(unCompInput);
+return compressData(unCompBuffer.array());
+  }
+
+  @Override p

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236381
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
--- End diff --

Done.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240236462
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
--- End diff --

Done.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240227006
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
--- End diff --

Based on the observations I have initialized the byteArrayOutputStream with 
size of half of byte buffer, So it reduces the number of resizing of the stream.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240157144
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+/**
+ * Below api will write bytes from specified byte array to the 
gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
--- End diff --

ok added the actual exception.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240135884
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 ---
@@ -35,8 +35,8 @@
   private final Map allSupportedCompressors = new 
HashMap<>();
 
   public enum NativeSupportedCompressor {
-SNAPPY("snappy", SnappyCompressor.class),
-ZSTD("zstd", ZstdCompressor.class);
+SNAPPY("snappy", SnappyCompressor.class), ZSTD("zstd", 
ZstdCompressor.class), GZIP("gzip",
--- End diff --

Done.



---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240130514
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+/**
+ * Below api will write bytes from specified byte array to the 
gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+  } finally {
+gzipCompressorOutputStream.close();
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+}
+return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * This method takes the Byte Array data and Deompresses in gzip format
+   *
+   * @param data   Data Byte Array for Compression
+   * @param offset Start value of Data Byte Array
+   * @param length Size of Byte Array
+   * @return
+   */
+  private byte[] decompressData(byte[] data, int offset, int length) {
+ByteArrayInputStream byteArrayOutputStream = new 
ByteArrayInputStream(data, offset, length);
+ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+try {
+  GzipCompressorInputStream gzipCompressorInputStream =
+  new GzipCompressorInputStream(byteArrayOutputStream);
+  byte[] buffer = new byte[1024];
+  int len;
+  /**
+   * Reads the next byte of the data from the input stream and stores 
them into buffer
+   * Data is then read from the buffer and put into byteOutputStream 
from a offset.
+   */
+  while ((len = gzipCompressorInputStream.read(buffer)) != -1) {
+byteOutputStream.write(buffer, 0, len);
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Decompression step " + 
e.getMessage());
+}
+return byteOutputStream.toByteArray();
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput, int byteSize) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput) {
+return decompressData(compInput, 0, compInput.length);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput, int offset, int 
length) {
+return decompressData(compInput, offset, length);
+  }
+
+  @Override public long rawUncompress(by

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240130469
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+/**
+ * Below api will write bytes from specified byte array to the 
gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+  } finally {
+gzipCompressorOutputStream.close();
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+}
+return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * This method takes the Byte Array data and Deompresses in gzip format
+   *
+   * @param data   Data Byte Array for Compression
+   * @param offset Start value of Data Byte Array
+   * @param length Size of Byte Array
+   * @return
+   */
+  private byte[] decompressData(byte[] data, int offset, int length) {
+ByteArrayInputStream byteArrayOutputStream = new 
ByteArrayInputStream(data, offset, length);
+ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+try {
+  GzipCompressorInputStream gzipCompressorInputStream =
+  new GzipCompressorInputStream(byteArrayOutputStream);
+  byte[] buffer = new byte[1024];
+  int len;
+  /**
+   * Reads the next byte of the data from the input stream and stores 
them into buffer
+   * Data is then read from the buffer and put into byteOutputStream 
from a offset.
+   */
+  while ((len = gzipCompressorInputStream.read(buffer)) != -1) {
+byteOutputStream.write(buffer, 0, len);
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Decompression step " + 
e.getMessage());
+}
+return byteOutputStream.toByteArray();
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput, int byteSize) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput) {
+return decompressData(compInput, 0, compInput.length);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput, int offset, int 
length) {
+return decompressData(compInput, offset, length);
+  }
+
+  @Override public long rawUncompress(by

[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240130373
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  public GzipCompressor() {
--- End diff --

Removed.


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-09 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240102212
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ---
@@ -168,6 +168,7 @@ class TestLoadDataWithCompression extends QueryTest 
with BeforeAndAfterEach with
   private val tableName = "load_test_with_compressor"
   private var executorService: ExecutorService = _
   private val csvDataDir = 
s"$integrationPath/spark2/target/csv_load_compression"
+  private val compressors = Array("snappy","zstd","gzip")
--- End diff --

No test cases were removed. Just changed the test case name of "test with 
snappy and offheap" was changed to "test different compressors and offheap". 


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-12-05 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r238971669
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
--- End diff --

done!



---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-12-05 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r238971644
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+  } finally {
+gzipCompressorOutputStream.close();
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+}
+
+return byteArrayOutputStream.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream byteArrayOutputStream = new 
ByteArrayInputStream(data);
+ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+
--- End diff --

done!



---


[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...

2018-11-28 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2948#discussion_r237000628
  
--- Diff: docs/faq.md ---
@@ -216,20 +216,18 @@ 
TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 ## How to check LRU cache memory footprint?
 To observe the LRU cache memory footprint in the logs, configure the below 
properties in log4j.properties file.
 ```
-log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG
 log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG
 ```
-These properties will enable the DEBUG log for the CarbonLRUCache and 
UnsafeMemoryManager which will print the information of memory consumed using 
which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will 
degrade the query performance.
+This properties will enable the DEBUG log for the CarbonLRUCache and 
UnsafeMemoryManager which will print the information of memory consumed using 
which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will 
degrade the query performance. Ensure carbon.max.driver.lru.cache.size is 
configured to observe the current cache size.
--- End diff --

Done!



---


[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...

2018-11-28 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2948#discussion_r236998844
  
--- Diff: docs/faq.md ---
@@ -216,20 +216,18 @@ 
TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 ## How to check LRU cache memory footprint?
 To observe the LRU cache memory footprint in the logs, configure the below 
properties in log4j.properties file.
 ```
-log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG
 log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG
 ```
-These properties will enable the DEBUG log for the CarbonLRUCache and 
UnsafeMemoryManager which will print the information of memory consumed using 
which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will 
degrade the query performance.
+This properties will enable the DEBUG log for the CarbonLRUCache and 
UnsafeMemoryManager which will print the information of memory consumed using 
which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will 
degrade the query performance. Ensure carbon.max.driver.lru.cache.size is 
configured to observe the current cache size.
--- End diff --

Actually we are now just showing one log for CarbonLRUCache.So *these* is 
is not correct usage. I'll change properties to property.


---


[GitHub] carbondata pull request #2948: [CARBONDATA-3124] Updated log message in Unsa...

2018-11-23 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2948

[CARBONDATA-3124] Updated log message in UnsafeMemoryManager

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata 23-nov

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2948


commit c19d454f4dcf95b82754fe05c3eb2df06d98fe33
Author: shardul-cr7 
Date:   2018-11-23T13:15:44Z

[CARBONDATA-3124] Updated log message in UnsafeMemoryManager




---


[GitHub] carbondata issue #2850: [CARBONDATA-3056] Added concurrent reading through S...

2018-10-31 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2850
  
retest this please


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-24 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227658900
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
--- End diff --

ok will do that!


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-24 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227658842
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
--- End diff --

ByteArrayOutputStream.close() does nothing. It's implementation in java is 
like this:

public void close() throws IOException {
}

I can close it but I'll have to copy the stream to byte Array and return 
that byte array which can be a costly operation.


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2847

[WIP]Support Gzip as column compressor

Gzip compressed file size is less than that of snappy but takes more time.

Data generated by tpch-dbgen(lineitem)

**Load Performance Comparisons (Compression)**

*Test Case 1*
*File Size 3.9G*
*Records ~30M*

| Codec Used | Load Time | File Size After Load | 
| -- | -- | -- |
| Snappy | 156s | 101M 
| Zstd| 153s | 2.2M 
| Gzip| 163s | 12.1M

*Test Case 2*
*File Size 7.8G*
*Records ~60M*

| Codec Used | Load Time | File Size After Load | 
| -- | -- | -- |
| Snappy | 336s | 203.6M 
| Zstd| 352s | 4.3M 
| Gzip| 354s | 12.1M

**Query Performance (Decompression)**

*Test Case 1*

| Codec Used | Full Scan Time  
| -- | -- 
| Snappy | 16.108s 
| Zstd| 14.595s 
| Gzip| 14.313s 

*Test Case 2*

| Codec Used | Full Scan Time  
| -- | -- 
| Snappy | 23.559s 
| Zstd| 23.913s 
| Gzip| 26.741s 

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
  added some testcases
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata b010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2847


commit 6ad88ccc5663353d16372d91878d7efb223b16d6
Author: shardul-cr7 
Date:   2018-10-23T11:57:47Z

[WIP]Support Gzip




---


[GitHub] carbondata issue #2758: [CARBONDATA-2972] Debug Logs and function added for ...

2018-09-26 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2758
  
retest this please


---


[GitHub] carbondata issue #2760: [CARBONDATA-2968] Single pass load fails 2nd time in...

2018-09-25 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2760
  
retest this please


---


[GitHub] carbondata pull request #2760: [CARBONDATA-2968] Single pass load fails 2nd ...

2018-09-25 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2760

[CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution 
due to port binding error.

Problem : In secure cluster setup, single pass load is failing in 
spark-submit after using the beeline.
Solution: It was happening because port was not getting updated and was not 
looking for the next empty port. port variable was not changing.So modified 
that part and added log to diplay the port number.

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
 manually
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata aftermycommit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2760


commit d2bd672aea1c2294f65f87a978be27e2d76d8c09
Author: shardul-cr7 
Date:   2018-09-25T14:25:19Z

[CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution 
due to port binding error




---


[GitHub] carbondata issue #2747: [CARBONDATA-2960] SDK Reader fix with projection col...

2018-09-24 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2747
  
retest this please


---


[GitHub] carbondata pull request #2739: [CARBONDATA-2954]Fix error when create extern...

2018-09-24 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2739#discussion_r219739583
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2226,7 +2226,11 @@ public static String 
getFilePathExternalFilePath(String path, Configuration conf
   if 
(dataFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
 return dataFile.getAbsolutePath();
   } else if (dataFile.isDirectory()) {
-return getFilePathExternalFilePath(dataFile.getAbsolutePath(), 
configuration);
+if (getFilePathExternalFilePath(dataFile.getAbsolutePath(), 
configuration) == null) {
--- End diff --

Handled the review comment and added the test cases also for the scenarios.


---


[GitHub] carbondata issue #2739: [CARBONDATA-2954]Fix error when create external tabl...

2018-09-20 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2739
  
retest this please


---


[GitHub] carbondata issue #2739: [CARBONDATA-2954]Fix error when create external tabl...

2018-09-20 Thread shardul-cr7
Github user shardul-cr7 commented on the issue:

https://github.com/apache/carbondata/pull/2739
  
retest this please


---


[GitHub] carbondata pull request #2739: [CARBONDATA-2954]Fix error when create extern...

2018-09-20 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2739

[CARBONDATA-2954]Fix error when create external table command fired if path 
already exists

Problem : Creating a external table and providing a valid location having 
some empty directory and .carbondata files was giving "operation not allowed: 
invalid datapath provided" error.

Solution: It was happening because if the location was having some empty 
directory getFilePathExternalFilePath method in carbonutil.java was returning 
null due to the presence of empty directory.So made a slight modification to 
prevent this problem.



 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
manually tested.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata latestb05

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2739


commit 6e75500e5e4a081b5be69cbbde5e586e13e07d9f
Author: shardul-cr7 
Date:   2018-09-20T14:12:54Z

[CARBONDATA-2954]Fix error when create external table command fired if path 
already exists




---


[GitHub] carbondata pull request #2714: [CARBONDATA-2875]Two different threads overwr...

2018-09-12 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2714

[CARBONDATA-2875]Two different threads overwriting the same carbondatafile.

Problem :- During concurrent load through two different threads in a 
external table for non transactional tables , two different threads were 
overwriting the same carbondata file.

Solution : This was happening because both the threads were assigning the 
same filename for the carbondata files so one was overwriting the other.This 
problem chances is  reduced by changing the timestamp attached to the filename 
from millisecond to nanosecond. 



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
Manually.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata concurrent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2714.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2714


commit a7fec5bbc8355aaa73f87af7152b3947b8fa9acd
Author: shardul-cr7 
Date:   2018-09-12T12:41:37Z

concurent load




---


[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...

2018-09-11 Thread shardul-cr7
Github user shardul-cr7 closed the pull request at:

https://github.com/apache/carbondata/pull/2710


---


[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...

2018-09-11 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2710

[2875]two different threads overwriting the same carbondatafile 

Problem : Two different threads are overwriting the same carbondata file 
during creation of external table.

Solution: Chances of two threads concurrently loading same carbondatafile 
is reduced by changing the timestamp attached in .carbondata file from 
millisecond to nanosecond.So chances of collision of different threads having 
the same file name is reduced.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
 Done Manually
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2710


commit 61520a3d0bacfbcbed5a2b5ac300f08cf9b36bb4
Author: shardul-cr7 
Date:   2018-09-11T12:51:09Z

chances of two different threads overwriting the same carbondatafile is 
reduced




---