from:"kunal642"

[GitHub] carbondata pull request #3064: [WIP] Updated DOC for No-Sort Compaction and ...

2019-01-10 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3064#discussion_r246777849
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/execution/command/CarbonHiveCommands.scala
 ---
@@ -127,6 +127,9 @@ object CarbonSetCommand {
 else if (isCarbonProperty) {
   sessionParams.addProperty(key, value)
 }
+else {
--- End diff --

remove this.. if spark property is set then it should not be validated by 
carbon


---

[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2019-01-08 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/3014
  
LGTM


---

[GitHub] carbondata issue #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation...

2019-01-08 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2996
  
LGTM


---

[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2019-01-07 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3014#discussion_r245884441
  
--- Diff: 
integration/spark2/src/test/scala/org/apache/spark/carbondata/commands/SetCommandTestCase.scala
 ---
@@ -128,6 +128,34 @@ class SetCommandTestCase extends Spark2QueryTest with 
BeforeAndAfterAll{
 sql(s"set 
${CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS}"))
 }
   }
+
+  test(s"test set ${ 
CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE } for valid options") {
+checkAnswer(
+  sql(s"set ${ CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE 
}=no_sort"),
--- End diff --

As the check property is being validated to have dbname and tablename in 
CarbonHiveCommands.scala therefore this test case should have failed. 

There is a problem in 
https://github.com/apache/carbondata/pull/3014/files#diff-eeaa616740e33846839055324b58aa4bR87
 as all the properties with @CarbonProperty annotations are being added blindly.

Please fix this.


---

[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2019-01-07 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3014#discussion_r245879143
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/SessionParams.java ---
@@ -229,6 +229,12 @@ private boolean validateKeyValue(String key, String 
value) throws InvalidConfigu
   if (!isValid) {
 throw new InvalidConfigurationException("Invalid value " + 
value + " for key " + key);
   }
+} else if 
(key.startsWith(CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE)) {
+  isValid = CarbonUtil.isValidSortOption(value);
+  if (!isValid) {
+throw new InvalidConfigurationException("The sort scope " + key
++ " can have only either BATCH_SORT or LOCAL_SORT or 
NO_SORT.");
--- End diff --

Sort scope can be global as well. Add in exception message


---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

2019-01-07 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245877242
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
@@ -1491,6 +1491,27 @@ private void validateSortMemorySpillPercentage() {
 }
   }
 
+  public int getMaxDictionaryThreshold() {
+int localDictionaryMaxThreshold = Integer.parseInt(carbonProperties
+
.getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD,
+
CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD_DEFAULT));
+if (localDictionaryMaxThreshold
--- End diff --

ok


---

[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...

2019-01-07 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r245653091
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/DecoderBasedFallbackEncoder.java
 ---
@@ -57,10 +57,7 @@ public DecoderBasedFallbackEncoder(EncodedColumnPage 
encodedColumnPage, int page
 int pageSize =
 encodedColumnPage.getActualPage().getPageSize();
 int offset = 0;
-int[] reverseInvertedIndex = new int[pageSize];
--- End diff --

In case of No_Sort where inverted index is not there, this variable was 
being created unnecessarily.


---

[GitHub] carbondata issue #3045: [CARBONDATA-3222]Fix dataload failure after creation...

2019-01-07 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/3045
  
LGTM


---

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

2019-01-06 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3046#discussion_r24497
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -2076,4 +2076,15 @@ private CarbonCommonConstants() {
*/
   public static final String 
CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512";
 
+  public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD =
--- End diff --

changed to CARBON_LOCAL_DICTIONARY_MAX_SIZE_THRESHOLD


---

[GitHub] carbondata issue #3046: [WIP] Added check to start fallback based on size

2019-01-06 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
@xuchuanyin The problem was that when using varchar column with email data 
the key for the dictionary map is very huge. When fallback happens the same 
data is kept in memory twice, which causes the application to throw OOM 
exception.
As this is a minor version therefore we do not want to expose this property 
to user, we can take in the next major version.

Plus parquet also has a size based limitation mechanism which ensures the 
size does not grow more than what the system can handle.

This PR is just to add the size based limitation so that the map size can 
be controlled.




---

[GitHub] carbondata pull request #3046: [WIP] Added check to start fallback based on ...

2019-01-02 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/3046

[WIP] Added check to start fallback based on size

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata oom_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3046.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3046


commit cfb870789eaa93ebe03393323456cdf2ed4daf17
Author: kunal642 
Date:   2019-01-02T10:09:20Z

fixed




---

[GitHub] carbondata issue #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Issue

2019-01-01 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/3010
  
@Shubh18s Please fix 2.3 build


---

[GitHub] carbondata pull request #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Is...

2018-12-26 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3010#discussion_r244085456
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala
 ---
@@ -106,22 +106,29 @@ class CarbonLateDecodeRule extends Rule[LogicalPlan] 
with PredicateHelper {
   def validateQueryDirectlyOnDataMap(relations: 
Seq[CarbonDecoderRelation]): Unit = {
 var isPreAggDataMapExists = false
 // first check if pre aggregate data map exists or not
-relations.foreach{relation =>
+relations.foreach { relation =>
   if (relation.carbonRelation.carbonTable.isChildDataMap) {
 isPreAggDataMapExists = true
   }
 }
 val validateQuery = CarbonProperties.getInstance
-  .getProperty(CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP,
-
CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE).toBoolean
+  .getProperty(CarbonCommonConstants.VALIDATE_DIRECT_QUERY_ON_DATAMAP)
 var isThrowException = false
 // if validate query is enabled and relation contains pre aggregate 
data map
--- End diff --

Change the comment according to the changes done


---

[GitHub] carbondata pull request #3010: [CARBONDATA-3189] Fix PreAggregate Datamap Is...

2018-12-26 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3010#discussion_r244084957
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1449,12 +1449,9 @@ private CarbonCommonConstants() {
 
   public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE 
= "false";
 
-  @CarbonProperty
   public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP =
   "carbon.query.validate.direct.query.on.datamap";
--- End diff --

I think we can remove this property and use 
carbon.query.directQueryOnDataMap.enabled.

Need to remove from configuration-paramters.md


---

[GitHub] carbondata issue #2988: [CARBONDATA-3174] Fix trailing space issue with varc...

2018-12-18 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2988
  
LGTM


---

[GitHub] carbondata issue #2951: [SDV] Add datasource testcases for Spark File Format

2018-12-17 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2951
  
@shivamasn Please add test cases for map type too


---

[GitHub] carbondata issue #2983: [CARBONDATA-3119] Fixed SDK Write for Complex Array ...

2018-12-17 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2983
  
LGTM


---

[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...

2018-12-17 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2899
  
LGTM


---

[GitHub] carbondata pull request #2983: [CARBONDATA-3119] Fixed SDK Write for Complex...

2018-12-17 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2983#discussion_r242403727
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/ArrayParserImpl.java
 ---
@@ -56,6 +56,10 @@ public ArrayObject parse(Object data) {
   }
   return new ArrayObject(array);
 }
+  } else if (value.isEmpty()) {
+Object[] array = new Object[1];
+array[0] = child.parse(value);
--- End diff --

Why not use value instead of child.parse??


---

[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...

2018-12-17 Thread kunal642

Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2994


---

[GitHub] carbondata pull request #2994: [WIP][CARBONDATA-2670] changed the impl of s3...

2018-12-17 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2994

[WIP][CARBONDATA-2670] changed the impl of s3 renameforce to rewrite

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-2670

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2994


commit b96fa42ee1fa0d3ba2d2081574d3a71eadb26ac2
Author: kunal642 
Date:   2018-12-17T13:53:14Z

[CARBONDATA-2670] changed the impl of s3 renameforce to rewrite




---

[GitHub] carbondata issue #2989: [CARBONDATA-3175]Fix Testcase failures in complex de...

2018-12-17 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2989
  
LGTM


---

[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...

2018-12-13 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2925
  
@xubo245 I dont think we are supporting thriftServer to be run through 
IntelliJ on local. In cluster mode this jar would already be available as part 
of spark dependency. 
For SDK because it is dependent on all modules therefore httpclient is 
required.

@manishgupta88 @kumarvishal09 @jackylk @ravipesala What are your opinions?


---

[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...

2018-12-10 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2968
  
LGTM


---

[GitHub] carbondata pull request #2977: [CARBONDATA-3147] Fixed concurrent load issue

2018-12-10 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2977#discussion_r240473257
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala
 ---
@@ -111,6 +113,29 @@ trait CommitHelper {
 }
   }
 
+  def mergeTableStatusContents(uuidTableStatusPath: String,
+  tableStatusPath: String): Boolean = {
+try {
--- End diff --

moved lock acquiring locking inside this method


---

[GitHub] carbondata pull request #2977: [CARBONDATA-3147] Fixed concurrent load issue

2018-12-10 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2977#discussion_r240473230
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala
 ---
@@ -111,6 +113,29 @@ trait CommitHelper {
 }
   }
 
+  protected def mergeTableStatusContents(uuidTableStatusPath: String,
--- End diff --

done


---

[GitHub] carbondata issue #2972: [CARBONDATA-3143] Fixed local dictionary in presto

2018-12-10 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2972
  
LGTM


---

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

2018-12-10 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2977#discussion_r240180385
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/AggregateDataMapCompactor.scala
 ---
@@ -79,9 +79,20 @@ class AggregateDataMapCompactor(carbonLoadModel: 
CarbonLoadModel,
 
CarbonSession.threadSet(CarbonCommonConstants.SUPPORT_DIRECT_QUERY_ON_DATAMAP,
   "true")
 loadCommand.processData(sqlContext.sparkSession)
-val newLoadMetaDataDetails = SegmentStatusManager.readLoadMetadata(
+val oldMetadataDetails = SegmentStatusManager.readLoadMetadata(
+  carbonTable.getMetadataPath, "")
+val newMetadataDetails = SegmentStatusManager.readLoadMetadata(
   carbonTable.getMetadataPath, uuid)
-val updatedLoadMetaDataDetails = newLoadMetaDataDetails collect {
+val mergedContent = oldMetadataDetails.collect {
+  case content =>
+val contentIndex = newMetadataDetails.indexOf(content)
--- End diff --

done


---

[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...

2018-12-10 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2968
  
retest this please


---

[GitHub] carbondata issue #2968: [CARBONDATA-3141] Removed Carbon Table Detail Comman...

2018-12-10 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2968
  
LGTM


---

[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error

2018-12-08 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2981
  
LGTM


---

[GitHub] carbondata pull request #2981: [CARBONDATA-3154] Fix spark-2.1 test error

2018-12-08 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2981#discussion_r240026697
  
--- Diff: 
integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala
 ---
@@ -998,9 +999,19 @@ class SparkCarbonDataSourceTest extends FunSuite with 
BeforeAndAfterAll {
 i += 1
   }
   writer.close()
-  spark.sql("create table complextable (stringfield string, 
structfield struct) " +
-s"using carbon location '$path'")
+  if (SparkUtil.isSparkVersionEqualTo("2.1")) {
+if (!FileFactory.isFileExist(path)) {
+  FileFactory.createDirectoryAndSetPermission(path,
+new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL))
+}
+spark.sql("create table complextable (stringfield string, 
structfield struct) " +
+  s"using carbon options(path '$path')")
+  } else if (SparkUtil.isSparkVersionXandAbove("2.2")) {
--- End diff --

yeah, right


---

[GitHub] carbondata pull request #2981: [CARBONDATA-3154] Fix spark-2.1 test error

2018-12-08 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2981#discussion_r240026585
  
--- Diff: 
integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala
 ---
@@ -998,9 +999,19 @@ class SparkCarbonDataSourceTest extends FunSuite with 
BeforeAndAfterAll {
 i += 1
   }
   writer.close()
-  spark.sql("create table complextable (stringfield string, 
structfield struct) " +
-s"using carbon location '$path'")
+  if (SparkUtil.isSparkVersionEqualTo("2.1")) {
+if (!FileFactory.isFileExist(path)) {
+  FileFactory.createDirectoryAndSetPermission(path,
+new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL))
+}
+spark.sql("create table complextable (stringfield string, 
structfield struct) " +
+  s"using carbon options(path '$path')")
+  } else if (SparkUtil.isSparkVersionXandAbove("2.2")) {
--- End diff --

change the else if to else so that the same command can run for 2.3 as 
well. Change the same for other tests as well


---

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

2018-12-05 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2977#discussion_r239331535
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala
 ---
@@ -156,20 +177,20 @@ object AlterTableDropPartitionPostStatusListener 
extends OperationEventListener
 // Generate table status file name without UUID, forExample: 
tablestatus
 val newTableSchemaPath = 
CarbonTablePath.getTableStatusFilePath(
   childCarbonTable.getTablePath)
-renameDataMapTableStatusFiles(oldTableSchemaPath, 
newTableSchemaPath, uuid)
+mergeTableStatusContents(oldTableSchemaPath, 
newTableSchemaPath, uuid)
 }
 // if true then the commit for one of the child tables has failed
 val commitFailed = 
renamedDataMaps.lengthCompare(childCommands.length) != 0
 if (commitFailed) {
   LOGGER.info("Reverting table status file to original state")
-  renamedDataMaps.foreach {
-command =>
-  val carbonTable = command.table
-  // rename the backup tablestatus i.e tablestatus_backup_UUID 
to tablestatus
-  val backupTableSchemaPath =
-
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) + "_backup_" + 
uuid
-  val tableSchemaPath = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
-  renameDataMapTableStatusFiles(backupTableSchemaPath, 
tableSchemaPath, "")
+  LOGGER.warn("Reverting table status file to original state")
--- End diff --

removed


---

[GitHub] carbondata pull request #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load...

2018-12-05 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2977

[WIP] [CARBONDATA-3147] Fixed concurrent load issue

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3147

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2977


commit d48374164a2b8ff129973e0e15f48c867ccddc1a
Author: kunal642 
Date:   2018-12-05T10:30:41Z

Fixed concurrent load issue




---

[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...

2018-12-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2940#discussion_r238161863
  
--- Diff: integration/spark2/pom.xml ---
@@ -105,6 +105,11 @@
 
   
 
+
+  org.apache.httpcomponents
--- End diff --

@xubo245 This change is already there in #2925 .  We can discuss on that PR 
on this. Please remove the changes from this PR. 

@jackylk Let us discuss whether this is actually required. Please refer the 
PR #2925 and give inputs


---

[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...

2018-12-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2940#discussion_r238142863
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
 ---
@@ -463,6 +464,39 @@ class TestPreAggCreateCommand extends QueryTest with 
BeforeAndAfterAll {
 executorService.shutdown()
   }
 
+  test("support set carbon.query.directQueryOnDataMap.enabled=true") {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+val testData = 
s"$rootPath/integration/spark-common-test/src/test/resources/sample.csv"
+sql("drop table if exists mainTable")
+sql(
+  s"""
+ | CREATE TABLE mainTable
+ |   (id Int,
+ |   name String,
+ |   city String,
+ |   age Int)
+ | STORED BY 'org.apache.carbondata.format'
+  """.stripMargin);
+
+
+sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$testData'
+ | into table mainTable
+   """.stripMargin);
+
+sql(
+  s"""
+ | create datamap preagg_sum on table mainTable
+ | using 'preaggregate'
+ | as select id,sum(age) from mainTable group by id
+   """.stripMargin);
+
+sql("set carbon.query.directQueryOnDataMap.enabled=true");
--- End diff --

Setting this property here would not be of use as this property is never 
validated in tests. 

Check 
https://github.com/apache/carbondata/blob/master/integration/spark-common/src/main/scala/org/apache/spark/sql/test/util/QueryTest.scala#L45
 for reference



---

[GitHub] carbondata issue #2964: [HOTFIX] Fix ArrayOutOfBound exception when duplicat...

2018-11-30 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2964
  
LGTM


---

[GitHub] carbondata issue #2965: [Documentation] Editorial review

2018-11-30 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2965
  
LGTM


---

[GitHub] carbondata issue #2951: [SDV] Add datasource testcases for Spark File Format

2018-11-30 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2951
  
@shivamasn Please add description for the PR. Also attach test report in 
the description.


---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-30 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r237781038
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.datasource
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+import scala.collection.JavaConverters._
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  var writerPath = new File(this.getClass.getResource("/").getPath
++
+"../." +
+
"./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+.getCanonicalPath
+  //getCanonicalPath gives path with \, but the code expects /.
+  writerPath = writerPath.replace("\\", "/");
+
+  def buildTestData(): Any = {
+
+FileUtils.deleteDirectory(new File(writerPath))
+
+val schema = new StringBuilder()
+  .append("[ \n")
+  .append("   {\"name\":\"string\"},\n")
+  .append("   {\"age\":\"int\"},\n")
+  .append("   {\"height\":\"double\"}\n")
+  .append("]")
+  .toString()
+
+try {
+  val builder = CarbonWriter.builder()
+  val writer =
+
builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build()
+  var i = 0
+  while (i < 100) {
+writer.write(Array[String]("robot" + i, String.valueOf(i), 
String.valueOf(i.toDouble / 2)))
+i += 1
+  }
+  writer.close()
+} catch {
+  case _: Throwable => None
+}
+  }
+
+  def cleanTestData() = {
+FileUtils.deleteDirectory(new File(writerPath))
+  }
+
+  def deleteIndexFile(path: String, extension: String) : Unit = {
+val file: CarbonFile = FileFactory
+  .getCarbonFile(path, FileFactory.getFileType(path))
+
+for (eachDir <- file.listFiles) {
+  if (!eachDir.isDirectory) {
+if (eachDir.getName.endsWith(extension)) {
+  CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+}
+  } else {
+deleteIndexFile(eachDir.getPath, extension)
+  }
+}
+  }
+
+  test("Running SQL directly and read carbondata files (sdk Writer Output) 
using the SparkCarbonFileFormat ")

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-30 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r237779541
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.datasource
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+import scala.collection.JavaConverters._
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  var writerPath = new File(this.getClass.getResource("/").getPath
++
+"../." +
+
"./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+.getCanonicalPath
+  //getCanonicalPath gives path with \, but the code expects /.
+  writerPath = writerPath.replace("\\", "/");
+
+  def buildTestData(): Any = {
+
+FileUtils.deleteDirectory(new File(writerPath))
+
+val schema = new StringBuilder()
+  .append("[ \n")
+  .append("   {\"name\":\"string\"},\n")
+  .append("   {\"age\":\"int\"},\n")
+  .append("   {\"height\":\"double\"}\n")
+  .append("]")
+  .toString()
+
+try {
+  val builder = CarbonWriter.builder()
+  val writer =
+
builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build()
+  var i = 0
+  while (i < 100) {
+writer.write(Array[String]("robot" + i, String.valueOf(i), 
String.valueOf(i.toDouble / 2)))
+i += 1
+  }
+  writer.close()
+} catch {
+  case _: Throwable => None
+}
+  }
+
+  def cleanTestData() = {
+FileUtils.deleteDirectory(new File(writerPath))
+  }
+
+  def deleteIndexFile(path: String, extension: String) : Unit = {
+val file: CarbonFile = FileFactory
+  .getCarbonFile(path, FileFactory.getFileType(path))
+
+for (eachDir <- file.listFiles) {
+  if (!eachDir.isDirectory) {
+if (eachDir.getName.endsWith(extension)) {
+  CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+}
+  } else {
+deleteIndexFile(eachDir.getPath, extension)
+  }
+}
+  }
+
+  test("Running SQL directly and read carbondata files (sdk Writer Output) 
using the SparkCarbonFileFormat ")

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-30 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r237779383
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.datasource
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+import scala.collection.JavaConverters._
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  var writerPath = new File(this.getClass.getResource("/").getPath
++
+"../." +
+
"./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+.getCanonicalPath
+  //getCanonicalPath gives path with \, but the code expects /.
+  writerPath = writerPath.replace("\\", "/");
+
+  def buildTestData(): Any = {
+
+FileUtils.deleteDirectory(new File(writerPath))
+
+val schema = new StringBuilder()
+  .append("[ \n")
+  .append("   {\"name\":\"string\"},\n")
+  .append("   {\"age\":\"int\"},\n")
+  .append("   {\"height\":\"double\"}\n")
+  .append("]")
+  .toString()
+
+try {
+  val builder = CarbonWriter.builder()
+  val writer =
+
builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build()
+  var i = 0
+  while (i < 100) {
+writer.write(Array[String]("robot" + i, String.valueOf(i), 
String.valueOf(i.toDouble / 2)))
+i += 1
+  }
+  writer.close()
+} catch {
+  case _: Throwable => None
+}
+  }
+
+  def cleanTestData() = {
+FileUtils.deleteDirectory(new File(writerPath))
+  }
+
+  def deleteIndexFile(path: String, extension: String) : Unit = {
+val file: CarbonFile = FileFactory
+  .getCarbonFile(path, FileFactory.getFileType(path))
+
+for (eachDir <- file.listFiles) {
+  if (!eachDir.isDirectory) {
+if (eachDir.getName.endsWith(extension)) {
+  CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+}
+  } else {
+deleteIndexFile(eachDir.getPath, extension)
+  }
+}
+  }
+
+  test("Running SQL directly and read carbondata files (sdk Writer Output) 
using the SparkCarbonFileFormat ")

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-30 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r237780471
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.datasource
--- End diff --

Please format all the newly added code


---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-30 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r237779458
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/datasource/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.datasource
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+import scala.collection.JavaConverters._
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  var writerPath = new File(this.getClass.getResource("/").getPath
++
+"../." +
+
"./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+.getCanonicalPath
+  //getCanonicalPath gives path with \, but the code expects /.
+  writerPath = writerPath.replace("\\", "/");
+
+  def buildTestData(): Any = {
+
+FileUtils.deleteDirectory(new File(writerPath))
+
+val schema = new StringBuilder()
+  .append("[ \n")
+  .append("   {\"name\":\"string\"},\n")
+  .append("   {\"age\":\"int\"},\n")
+  .append("   {\"height\":\"double\"}\n")
+  .append("]")
+  .toString()
+
+try {
+  val builder = CarbonWriter.builder()
+  val writer =
+
builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build()
+  var i = 0
+  while (i < 100) {
+writer.write(Array[String]("robot" + i, String.valueOf(i), 
String.valueOf(i.toDouble / 2)))
+i += 1
+  }
+  writer.close()
+} catch {
+  case _: Throwable => None
+}
+  }
+
+  def cleanTestData() = {
+FileUtils.deleteDirectory(new File(writerPath))
+  }
+
+  def deleteIndexFile(path: String, extension: String) : Unit = {
+val file: CarbonFile = FileFactory
+  .getCarbonFile(path, FileFactory.getFileType(path))
+
+for (eachDir <- file.listFiles) {
+  if (!eachDir.isDirectory) {
+if (eachDir.getName.endsWith(extension)) {
+  CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+}
+  } else {
+deleteIndexFile(eachDir.getPath, extension)
+  }
+}
+  }
+
+  test("Running SQL directly and read carbondata files (sdk Writer Output) 
using the SparkCarbonFileFormat ")

[GitHub] carbondata issue #2956: [CARBONDATA-3134] fixed null values when cachelevel ...

2018-11-28 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2956
  
@manishgupta88 Please review


---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r236647791
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/SparkCarbonDataSourceTestCase.scala
 ---
@@ -0,0 +1,1267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.cluster.sdv.generated.dli
+
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, File, InputStream}
+
+
+
+import scala.collection.mutable
+
+import org.apache.avro
+import org.apache.avro.file.DataFileWriter
+import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, 
GenericRecord}
+import org.apache.avro.io.{DecoderFactory, Encoder}
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.common.util.QueryTest
+import org.junit.Assert
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.datatype.{DataTypes, 
StructField}
+import org.apache.carbondata.hadoop.testutil.StoreCreator
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+
+class SparkCarbonDataSourceTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+  val rootPath = new File(this.getClass.getResource("/").getPath
++ "../../../..").getCanonicalPath
+
+  val warehouse1 = 
FileFactory.getPath(s"$rootPath/integration/spark-datasource/target/warehouse").toString
+
+  test("test write using dataframe") {
+import sqlContext.implicits._
+val df = sqlContext.sparkContext.parallelize(1 to 10)
+  .map(x => ("a" + x % 10, "b", x))
+  .toDF("c1", "c2", "number")
+sql("drop table if exists testformat")
+// Saves dataframe to carbon file
+df.write
+  .format("carbon").saveAsTable("testformat")
+assert(sql("select * from testformat").count() == 10)
+assert(sql("select * from testformat where c1='a0'").count() == 1)
+assert(sql("select * from testformat").count() == 10)
+sql("drop table if exists testformat")
+  }
+
+  test("test write using ddl") {
+import sqlContext.implicits._
+val df = sqlContext.sparkContext.parallelize(1 to 10)
+  .map(x => ("a" + x % 10, "b", x))
+  .toDF("c1", "c2", "number")
+sql("drop table if exists testparquet")
+sql("drop table if exists testformat")
+// Saves dataframe to carbon file
+df.write
+  .format("parquet").saveAsTable("testparquet")
+sql("create table carbon_table(c1 string, c2 string, number int) using 
carbon")
+sql("insert into carbon_table select * from testparquet")
+checkAnswer(sql("select * from carbon_table where c1='a1'"), 
sql("select * from testparquet where c1='a1'"))
+if (!sqlContext.sparkContext.version.startsWith("2.1")) {
+  val mapSize = DataMapStoreManager.getInstance().getAllDataMaps.size()
+  DataMapStoreManager.getInstance()
+.clearDataMaps(AbsoluteTableIdentifier.from(warehouse1 + 
"/carbon_table"))
+  assert(mapSize >= 
DataMapStoreManager.getInstance().getAllDataMaps.size())
+}
+sql("drop table if exists testparquet")
+sql("drop table if exists testformat")
+  }
+
+  test("test read with df write") {
+
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(warehouse1 + 
"/

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r236647192
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/SparkCarbonDataSourceTestCase.scala
 ---
@@ -0,0 +1,1267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.cluster.sdv.generated.dli
+
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, File, InputStream}
+
--- End diff --

Remove extra lines



---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r236646305
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.dli
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
--- End diff --

Dont leave blank lines


---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r236646395
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.dli
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+CarbonProperties.getInstance()
--- End diff --

Remove this line


---

[GitHub] carbondata pull request #2951: [SDV] Add datasource testcases for Spark File...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2951#discussion_r236646951
  
--- Diff: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/dli/CreateTableUsingSparkCarbonFileFormatTestCase.scala
 ---
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.cluster.sdv.generated.dli
+
+import java.io.File
+import java.text.SimpleDateFormat
+import java.util.{Date, Random}
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.FileUtils
+import org.apache.commons.lang.RandomStringUtils
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.util.SparkUtil
+
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.datatype.DataTypes
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverter}
+import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema}
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.common.util.QueryTest
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter
+
+class CreateTableUsingSparkCarbonFileFormatTestCase extends QueryTest with 
BeforeAndAfterAll {
+
+
+
+  override def beforeAll(): Unit = {
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  override def afterAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_MINMAX_ALLOWED_BYTE_COUNT,
+CarbonCommonConstants.CARBON_MINMAX_ALLOWED_BYTE_COUNT_DEFAULT)
+sql("DROP TABLE IF EXISTS sdkOutputTable")
+  }
+
+  var writerPath = new File(this.getClass.getResource("/").getPath
++
+"../." +
+
"./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+.getCanonicalPath
+  //getCanonicalPath gives path with \, but the code expects /.
+  writerPath = writerPath.replace("\\", "/");
+
+  def buildTestData(): Any = {
+
+FileUtils.deleteDirectory(new File(writerPath))
+
+val schema = new StringBuilder()
+  .append("[ \n")
+  .append("   {\"name\":\"string\"},\n")
+  .append("   {\"age\":\"int\"},\n")
+  .append("   {\"height\":\"double\"}\n")
+  .append("]")
+  .toString()
+
+try {
+  val builder = CarbonWriter.builder()
+  val writer =
+
builder.outputPath(writerPath).withCsvInput(Schema.parseJson(schema)).writtenBy("CreateTableUsingSparkCarbonFileFormatTestCase").build()
+  var i = 0
+  while (i < 100) {
+writer.write(Array[String]("robot" + i, String.valueOf(i), 
String.valueOf(i.toDouble / 2)))
+i += 1
+  }
+  writer.close()
+} catch {
+  case _: Throwable => None
+}
+  }
+
+  def cleanTestData() = {
+FileUtils.deleteDirectory(new File(writerPath))
+  }
+
+  def deleteIndexFile(path: String, extension: String) : Unit = {
+val file: CarbonFile = FileFactory
+  .getCarbonFile(path, FileFactory.getFileType(path))
+
+for (eachDir <- file.listFiles) {
+  if (!eachDir.isDirectory) {
+if (eachDir.getName.endsWith(extension)) {
+  CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+}
+  } else {

[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2956#discussion_r236768667
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
 ---
@@ -332,13 +334,42 @@ public void clear() {
   }
   SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other =
   (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj;
-  return tableIdentifier.equals(other.tableIdentifier) && 
columnsInTable
-  .equals(other.columnsInTable) && Arrays
+  return tableIdentifier.equals(other.tableIdentifier) && 
checkColumnSchemaEquality(
+  columnsInTable, other.columnsInTable) && Arrays
   .equals(columnCardinality, other.columnCardinality);
 }
 
+private boolean checkColumnSchemaEquality(List obj1, 
List obj2) {
+  List clonedObj1 = new ArrayList<>(obj1);
--- End diff --

done


---

[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2956#discussion_r236768636
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
 ---
@@ -332,13 +334,42 @@ public void clear() {
   }
   SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other =
   (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj;
-  return tableIdentifier.equals(other.tableIdentifier) && 
columnsInTable
-  .equals(other.columnsInTable) && Arrays
+  return tableIdentifier.equals(other.tableIdentifier) && 
checkColumnSchemaEquality(
+  columnsInTable, other.columnsInTable) && Arrays
   .equals(columnCardinality, other.columnCardinality);
 }
 
+private boolean checkColumnSchemaEquality(List obj1, 
List obj2) {
+  List clonedObj1 = new ArrayList<>(obj1);
+  List clonedObj2 = new ArrayList<>(obj2);
+  clonedObj1.addAll(obj1);
+  clonedObj2.addAll(obj2);
+  Collections.sort(clonedObj1, new Comparator() {
+@Override public int compare(ColumnSchema o1, ColumnSchema o2) {
+  return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+}
+  });
+  Collections.sort(clonedObj2, new Comparator() {
--- End diff --

done


---

[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2956#discussion_r236768643
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java
 ---
@@ -332,13 +334,42 @@ public void clear() {
   }
   SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper other =
   (SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper) obj;
-  return tableIdentifier.equals(other.tableIdentifier) && 
columnsInTable
-  .equals(other.columnsInTable) && Arrays
+  return tableIdentifier.equals(other.tableIdentifier) && 
checkColumnSchemaEquality(
+  columnsInTable, other.columnsInTable) && Arrays
   .equals(columnCardinality, other.columnCardinality);
 }
 
+private boolean checkColumnSchemaEquality(List obj1, 
List obj2) {
+  List clonedObj1 = new ArrayList<>(obj1);
+  List clonedObj2 = new ArrayList<>(obj2);
+  clonedObj1.addAll(obj1);
+  clonedObj2.addAll(obj2);
+  Collections.sort(clonedObj1, new Comparator() {
+@Override public int compare(ColumnSchema o1, ColumnSchema o2) {
+  return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+}
+  });
+  Collections.sort(clonedObj2, new Comparator() {
+@Override public int compare(ColumnSchema o1, ColumnSchema o2) {
+  return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+}
+  });
+  boolean exists = true;
+  for (int i = 0; i < obj1.size(); i++) {
+if (!clonedObj1.get(i).equalsWithColumnId(clonedObj2.get(i))) {
+  exists = false;
+  break;
+}
+  }
+  return exists;
+}
+
 @Override public int hashCode() {
-  return tableIdentifier.hashCode() + columnsInTable.hashCode() + 
Arrays
+  int hashCode = 0;
+  for (ColumnSchema columnSchema: columnsInTable) {
+hashCode += columnSchema.hashCodeWithColumnId();
--- End diff --

done


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

2018-11-27 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2942
  
LGTM


---

[GitHub] carbondata pull request #2956: [CARBONDATA-3134] fixed null values when cach...

2018-11-27 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2956

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet

**Problem:**
For each blocklet an object of SegmentPropertiesAndSchemaHolder is created 
to store the schema used for query. This object is created only if no other 
blocklet has the same schema. To check the schema we are comparing 
List, as the equals method in ColumnSchema does not check for 
columnUniqueId therefore this check is failing and the new restructured 
blocklet is using the schema of the old blocklet. Due to this the newly added 
column is being ignored as the old blocklet schema specifies that the column is 
delete(alter drop).

**Solution:**
Instead of checking the equality through equals and hashcode, write a new 
implementation for both and check based on columnUniqueId.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3134

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2956


commit e74b3f22d285f5b5c37b17a11248b5e7977326b2
Author: kunal642 
Date:   2018-11-27T08:43:27Z

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet

Problem:
For each blocklet an object of SegmentPropertiesAndSchemaHolder is created 
to store the schema used for query. This object is created only if no other 
blocklet has the same schema. To check the schema we are comparing 
List, as the equals method in ColumnSchema does not check for 
columnUniqueId therefore this check is failing and the new restructured 
blocklet is using the schema of the old blocklet. Due to this the newly added 
column is being ignored as the old blocklet schema specifies that the column is 
delete(alter drop).

Solution:
Instead of checking the equality through equals and hashcode, write a new 
implementation for both and check based on columnUniqueId.




---

[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...

2018-11-26 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2945#discussion_r236531750
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -66,9 +63,6 @@
 this.readers = readers;
 this.index = 0;
 this.currentReader = readers.get(0);
-CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo();
--- End diff --

@xubo245 i have update the JIRA. Please check


---

[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...

2018-11-23 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2945#discussion_r235892400
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -66,9 +63,6 @@
 this.readers = readers;
 this.index = 0;
 this.currentReader = readers.get(0);
-CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo();
--- End diff --

ok


---

[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...

2018-11-23 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2945#discussion_r235891858
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -66,9 +63,6 @@
 this.readers = readers;
 this.index = 0;
 this.currentReader = readers.get(0);
-CarbonTaskInfo carbonTaskInfo = new CarbonTaskInfo();
--- End diff --

The same issue would be there in CSDK as well as this would be called 
internally.


---

[GitHub] carbondata issue #2945: [CARBONDATA-3123] Fixed JVM crash issue with CarbonR...

2018-11-22 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2945
  
retest this please


---

[GitHub] carbondata pull request #2945: [CARBONDATA-3123] Fixed JVM crash issue with ...

2018-11-22 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2945

[CARBONDATA-3123] Fixed JVM crash issue with CarbonRecordReader(SDK Reader).

**Problem:**
As CarbonReaderBuilder is executed on the main thread therefore while 
Reader creation we are setting TaskId to threadlocal. When multiple readers are 
created using the split API then the TaskID for the last initialized reader 
would be overridden and all the readers will use the same TaskID.
Due to this when one reader is reading and the other reader is freeing 
memory after its task completion the same memory block would be cleared and 
read at the same time causing SIGSEGV error.

**Solution:**
Do not set TaskID to thread local while Reader Initialization. 
ThreadLocalTaskInfo.getCarbonTaskInfo will take care of assigning new TaskID if 
not already present.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata bug/CARBONDATA-3123

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2945


commit cce1811fa518545b6b31246efd5668048e7d24ea
Author: kunal642 
Date:   2018-11-23T05:41:44Z

[CARBONDATA-3123] Fixed JVM crash issue with CarbonRecordReader(SDK Reader).

**Problem:**
As CarbonReaderBuilder is executed on the main thread therefore while 
Reader creation we are setting TaskId to threadlocal. When multiple readers are 
created using the split API then the TaskID for the last initialized reader 
would be overridden and all the readers will use the same TaskID.
Due to this when one reader is reading and the other reader is freeing 
memory after its task completion the same memory block would be cleared and 
read at the same time causing SIGSEGV error.

**Solution:**
Do not set TaskID to thread local while Reader Initialization. 
ThreadLocalTaskInfo.getCarbonTaskInfo will take care of assigning new TaskID if 
not already present.




---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

2018-11-22 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r235681603
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
 ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the 
colname in table schema and
 sql("drop datamap if exists preaggTable on table partitionTable")
   }
 
+  test("validate data in partition table after dropping and adding a 
column") {
+sql("drop table if exists par")
+sql("create table par(name string) partitioned by (age double) stored 
by " +
+  "'carbondata'")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
+sql("alter table par drop columns(name)")
+sql("alter table par add columns(name string)")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
--- End diff --

@ravipesala Spark-2.1 and 2.2 both put partition column at the last even if 
a new column is added.


---

[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...

2018-11-21 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2929
  
LGTM


---

[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...

2018-11-21 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2929
  
@xubo245 Please resolve the conflicts



---

[GitHub] carbondata issue #2935: [HOTFIX] Intializing the CSDK object reference to NU...

2018-11-21 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2935
  
LGTM


---

[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...

2018-11-20 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2921
  
LGTM


---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

2018-11-20 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2923#discussion_r234966352
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
 ---
@@ -437,6 +437,20 @@ test("Creation of partition table should fail if the 
colname in table schema and
 sql("drop datamap if exists preaggTable on table partitionTable")
   }
 
+  test("validate data in partition table after dropping and adding a 
column") {
+sql("drop table if exists par")
+sql("create table par(name string) partitioned by (age double) stored 
by " +
+  "'carbondata'")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
+sql("alter table par drop columns(name)")
+sql("alter table par add columns(name string)")
+sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' 
into table par options" +
+s"('header'='false')")
--- End diff --

Actually spark expects the partition column to be the last. So the solution 
that you are proposing will not work incase of datasource because spark will 
parse the schema and would always add the partition column to the last


---

[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...

2018-11-20 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2925#discussion_r234962295
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala
 ---
@@ -48,8 +48,13 @@ object CarbonThriftServer {
   System.exit(0)
 }
 
+val master = Option(System.getProperty("spark.master"))
--- End diff --

ok


---

[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...

2018-11-20 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2925#discussion_r234962085
  
--- Diff: integration/spark2/pom.xml ---
@@ -134,6 +134,11 @@
 
   
 
+
--- End diff --

But nobody else is facing this issue. And also none of our environments are 
getting this exception. Maybe u need to add httpclient jars to you cluster 
seperately?


---

[GitHub] carbondata pull request #2921: [CARBONDATA-3104] Removed unnecessary configu...

2018-11-19 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2921#discussion_r234860057
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDropTableCommand.scala
 ---
@@ -143,7 +143,6 @@ case class CarbonDropTableCommand(
   OperationListenerBus.getInstance.fireEvent(dropTablePostEvent, 
operationContext)
 } catch {
   case ex: NoSuchTableException =>
-LOGGER.error(ex.getLocalizedMessage, ex)
--- End diff --

Logging this event is necessary. Instead of printing the whole exception 
stack we can just print the message as a warning.


---

[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...

2018-11-18 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2921
  
retest this please


---

[GitHub] carbondata issue #2924: [CARBONDATA-3065]Correct the error message for inver...

2018-11-16 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2924
  
LGTM


---

[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...

2018-11-16 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2925#discussion_r234191527
  
--- Diff: integration/spark2/pom.xml ---
@@ -134,6 +134,11 @@
 
   
 
+
--- End diff --

Why is this dependency needed in spark2?


---

[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...

2018-11-16 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2925#discussion_r234191597
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala
 ---
@@ -48,8 +48,13 @@ object CarbonThriftServer {
   System.exit(0)
 }
 
+val master = Option(System.getProperty("spark.master"))
--- End diff --

Was this the reason for NoClassDefFoundError?


---

[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...

2018-11-16 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2925
  
@xubo245 I what scenario are you facing this issue? I am not facing this 
issue. 


---

[GitHub] carbondata issue #2921: [CARBONDATA-3104] Removed unnecessary configuration ...

2018-11-16 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2921
  
retest this please


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

2018-11-16 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2923
  
@ravipesala @manishgupta88 Please review


---

[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...

2018-11-16 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2925
  
Please add description


---

[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix

2018-11-15 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2922
  
retest this please


---

[GitHub] carbondata pull request #2924: [CARBONDATA-3065]Correct the error message fo...

2018-11-15 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2924#discussion_r234093857
  
--- Diff: 
integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala
 ---
@@ -1345,6 +1345,15 @@ class SparkCarbonDataSourceTest extends FunSuite 
with BeforeAndAfterAll {
 spark.sql("drop table if exists fileformat_drop_hive")
   }
 
+  test("validate the columns not present in schema") {
+spark.sql("drop table if exists validate")
--- End diff --

Need to drop table after test case completion. better to add in afterAll() 
too


---

[GitHub] carbondata issue #2922: [WIP]s3 lock file fix

2018-11-15 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2922
  
@jackylk No this is for ArrayIndexOutOfBoundsException when taking lock on 
S3. 3103 seems to be a different issue.


---

[GitHub] carbondata pull request #2902: [CARBONDATA-3083] Fixed data mismatch issue a...

2018-11-15 Thread kunal642

Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2902


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

2018-11-15 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2923
  
retest this please


---

[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

2018-11-14 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2923

[WIP] added partition columns to the last when collecting columns

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata partition_alter_bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2923


commit c399910c682f2573827214b1f76aea0225d7ecf8
Author: kunal642 
Date:   2018-11-15T04:40:00Z

added partition columns to the last when collecting columns




---

[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...

2018-11-13 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2903
  
LGTM


---

[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...

2018-11-06 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2903
  
LGTM


---

[GitHub] carbondata pull request #2903: [CARBONDATA-3084]dataload failure fix when fl...

2018-11-06 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2903#discussion_r231038773
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -325,32 +325,42 @@ static ColumnPageCodec 
selectCodecByAlgorithmForFloating(SimpleStatsResult stats
 //Here we should use the Max abs as max to getDatatype, let's say -1 
and -1000, -1 is max,
 //but we can't use -1 to getDatatype, we should use -1000.
 double absMaxValue = Math.max(Math.abs(maxValue), Math.abs(minValue));
-if (decimalCount == 0) {
+if (decimalCount == 0 && srcDataType == DataTypes.FLOAT) {
--- End diff --

no need to check for decimalCount == 0. If dataType is FLOAT call  
getColumnPageCodec(..)


---

[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...

2018-11-06 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2901
  
retest this please


---

[GitHub] carbondata pull request #2902: [WIP] Fixed data mismatch issue after update

2018-11-05 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2902

[WIP] Fixed data mismatch issue after update

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata update_data_mismatch_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2902


commit bbd3dc87ac84c1d4005379dd445dec30f31f24aa
Author: kunal642 
Date:   2018-11-06T05:21:00Z

fixed data mismatch issue after update




---

[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...

2018-11-05 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2901#discussion_r231001835
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -171,13 +171,20 @@ public Object getCurrentValue() throws IOException, 
InterruptedException {
 rowCount += 1;
 Object[] row = new Object[carbonColumnarBatch.columnVectors.length];
 for (int i = 0; i < carbonColumnarBatch.columnVectors.length; i ++) {
+  Object data = carbonColumnarBatch.columnVectors[i].getData(batchIdx 
- 1);
   if (carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.STRING
   || carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.VARCHAR) {
-byte[] data = (byte[]) 
carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1);
-row[i] = ByteUtil.toString(data, 0, data.length);
+if (data == null) {
+  row[i] = null;
+} else {
+  row[i] = ByteUtil.toString((byte[]) data, 0, (((byte[]) 
data).length));
+}
   } else if (carbonColumnarBatch.columnVectors[i].getType() == 
DataTypes.BOOLEAN) {
-byte data = (byte) 
carbonColumnarBatch.columnVectors[i].getData(batchIdx - 1);
-row[i] = ByteUtil.toBoolean(data);
+if (data == null) {
+  row[i] = null;
+} else {
+  row[i] = ByteUtil.toBoolean((byte) data);
+}
--- End diff --

getData is already has a check for null values. Because here explicit 
conversion is required therefore null check had to be added.


---

[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...

2018-11-05 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2901#discussion_r231001760
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1844,4 +1844,53 @@ public void testVectorReader() {
 }
   }
 
+  @Test
+  public void testReadingNullValues() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[2];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.BOOLEAN);
--- End diff --

done


---

[GitHub] carbondata issue #2901: [CARBONDATA-3081] Fixed NPE for boolean type column ...

2018-11-05 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2901
  
retest this please


---

[GitHub] carbondata pull request #2901: [CARBONDATA-3081] Fixed NPE for boolean type ...

2018-11-05 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2901

[CARBONDATA-3081] Fixed NPE for boolean type column with null value

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata npe_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2901.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2901


commit 906eee74f20fbc1a9983db1ce0073356649a7724
Author: kunal642 
Date:   2018-11-05T13:16:44Z

fixed NPE for boolean type column with null value




---

[GitHub] carbondata issue #2850: [CARBONDATA-3056] Added concurrent reading through S...

2018-11-02 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2850
  
LGTM



---

[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-02 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
LGTM


---

[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...

2018-11-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r230315785
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -114,6 +115,57 @@ public static CarbonReaderBuilder builder(String 
tablePath) {
 return builder(tablePath, tableName);
   }
 
+  /**
+   * Breaks the list of CarbonRecordReader in CarbonReader into multiple
+   * CarbonReader objects, each iterating through some 'carbondata' files
+   * and return that list of CarbonReader objects
+   *
+   * If the no. of files is greater than maxSplits, then break the
+   * CarbonReader into maxSplits splits, with each split iterating
+   * through >= 1 file.
+   *
+   * If the no. of files is less than maxSplits, then return list of
+   * CarbonReader with size as the no. of files, with each CarbonReader
+   * iterating through exactly one file
+   *
+   * @param maxSplits: Int
+   * @return list of {@link CarbonReader} objects
+   */
+  public List split(int maxSplits) throws IOException {
+validateReader();
+if (maxSplits < 1) {
+  throw new RuntimeException(
+  this.getClass().getSimpleName() + ".split: maxSplits must be 
positive");
+}
+
+List carbonReaders = new ArrayList<>();
+
+if (maxSplits < this.readers.size()) {
--- End diff --

@ravipesala Let us add test cases in a separate PR. would it be okay?


---

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230305381
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -158,14 +173,31 @@ public CarbonReaderBuilder withHadoopConf(String key, 
String value) {
 }
 
 try {
-  final List splits =
-  format.getSplits(new JobContextImpl(job.getConfiguration(), new 
JobID()));
-
+  List splits;
+  if (filterExpression == null) {
+splits = format.getAllFileSplits(job);
+  } else {
+splits = format.getSplits(new 
JobContextImpl(job.getConfiguration(), new JobID()));
+  }
   List> readers = new ArrayList<>(splits.size());
   for (InputSplit split : splits) {
 TaskAttemptContextImpl attempt =
 new TaskAttemptContextImpl(job.getConfiguration(), new 
TaskAttemptID());
-RecordReader reader = format.createRecordReader(split, attempt);
+RecordReader reader;
+QueryModel queryModel = format.createQueryModel(split, attempt);
+boolean hasComplex = false;
+for (ProjectionDimension projectionDimension : 
queryModel.getProjectionDimensions()) {
+  if (projectionDimension.getDimension().isComplex()) {
--- End diff --

sure


---

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230277096
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration 
configuration) throws IO
 }
   }
 
+  /**
+   * This method will list all the carbondata files in the table path and 
treat one carbondata
+   * file as one split.
+   */
+  public List getAllFileSplits(JobContext job) throws 
IOException {
--- End diff --

added a flag "filter_blocks" in jobConf


---

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230277015
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
 ---
@@ -347,9 +347,7 @@ private void fillVector(ColumnPage columnPage, 
CarbonColumnVector vector,
 columnPage.getNullBits());
   } else if (vectorDataType == DataTypes.FLOAT) {
 float[] floatPage = columnPage.getFloatPage();
-for (int i = 0; i < pageSize; i++) {
-  vector.putFloats(0, pageSize, floatPage, 0);
-}
+vector.putFloats(0, pageSize, floatPage, 0);
--- End diff --

removed


---

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-02 Thread kunal642

Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230277069
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration 
configuration) throws IO
 }
   }
 
+  /**
+   * This method will list all the carbondata files in the table path and 
treat one carbondata
+   * file as one split.
+   */
+  public List getAllFileSplits(JobContext job) throws 
IOException {
+List splits = new ArrayList<>();
+CarbonTable carbonTable = 
getOrCreateCarbonTable(job.getConfiguration());
+if (null == carbonTable) {
+  throw new IOException("Missing/Corrupt schema file for table.");
+}
+for (CarbonFile carbonFile : 
getAllCarbonDataFiles(carbonTable.getTablePath())) {
+  CarbonInputSplit split =
+  new CarbonInputSplit("null", new 
Path(carbonFile.getAbsolutePath()), 0,
+  carbonFile.getLength(), carbonFile.getLocations(), 
FileFormat.COLUMNAR_V3);
+  split.setVersion(ColumnarFormatVersion.V3);
+  BlockletDetailInfo info = new BlockletDetailInfo();
+  split.setDetailInfo(info);
+  info.setBlockSize(carbonFile.getLength());
+  // Read the footer offset and set.
+  FileReader reader = FileFactory
--- End diff --

moved to CarbonVectorizedRecordReader.initialize()


---

1 2 3 4 5 6 7 8 >

1 - 100 of 728 matches

Mail list logo