[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9356/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1304/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1092/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
retest this please


---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9355/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1303/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1091/



---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228716777
  
--- Diff: store/CSDK/src/CarbonReader.h ---
@@ -40,6 +40,19 @@ class CarbonReader {
  */
 jobject carbonReaderObject;
 
+/**
+ * check whether has called builder
+ *
+ * @return true or throw exception
+ */
+bool checkBuilder();
+
+/**
+ * check reader and whether has called build
--- End diff --

ok, done


---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228716774
  
--- Diff: store/CSDK/src/CarbonReader.h ---
@@ -40,6 +40,19 @@ class CarbonReader {
  */
 jobject carbonReaderObject;
 
+/**
+ * check whether has called builder
--- End diff --

ok, done


---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228715336
  
--- Diff: store/CSDK/src/CarbonReader.h ---
@@ -40,6 +40,19 @@ class CarbonReader {
  */
 jobject carbonReaderObject;
 
+/**
+ * check whether has called builder
--- End diff --

can you improve this comment, suggest you write like "Return true if "


---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228715343
  
--- Diff: store/CSDK/src/CarbonReader.h ---
@@ -40,6 +40,19 @@ class CarbonReader {
  */
 jobject carbonReaderObject;
 
+/**
+ * check whether has called builder
+ *
+ * @return true or throw exception
+ */
+bool checkBuilder();
+
+/**
+ * check reader and whether has called build
--- End diff --

can you improve this comment, suggest you write like "Return true if "


---


[GitHub] carbondata issue #2849: [CARBONDATA-2896] Added TestCases for Adaptive encod...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2849
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1302/



---


[GitHub] carbondata issue #2849: [CARBONDATA-2896] Added TestCases for Adaptive encod...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2849
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9354/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread ajantha-bhat
Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
LGTM


---


[GitHub] carbondata issue #2849: [CARBONDATA-2896] Added TestCases for Adaptive encod...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2849
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1090/



---


[GitHub] carbondata issue #2864: [CARBONDATA-3041] Optimize load minimum size strateg...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2864
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9353/



---


[GitHub] carbondata issue #2864: [CARBONDATA-3041] Optimize load minimum size strateg...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2864
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1301/



---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1299/



---


[GitHub] carbondata issue #2865: [CARBONDATA-3002] Fix some spell error

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2865
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9352/



---


[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

2018-10-27 Thread NamanRastogi
Github user NamanRastogi commented on the issue:

https://github.com/apache/carbondata/pull/2850
  
Please check the split method, it splits the list of `CarbonRecordReader` 
into multiple `CarbonReader`s. It does not jumble the order of 
`CarbonRecordReader`, it still keeps them sequential.

Suppose there are 10 *carbondata* files and thus 10 `CarbonRecordReader` in 
`CarbonReader.readers` object and the user wants to get 3 splits, so he will 
get a list like this:
```java
CarbonReader reader = CarbonReader.builder(dataDir).build();
List multipleReaders = reader.split(3);
```
And the indices of `CarbonRecordReader`s in `multipleReaders` will be like:
`multipleReaders.get(0).readers` points to {0,1,2,3} indices of 
*carbondata* files
`multipleReaders.get(1).readers` points to {4,5,6} indices of *carbondata* 
files
`multipleReaders.get(2).readers` points to {7,8,9} indices of *carbondata* 
files

Now, if you read the rows like following code, the rows will still be in 
order.
```java
for (CarbonReader reader_i : multipleReaders) {
reader_i.readNextRow();
}
```

Earlier, you were getting data from 5th `CarbonRecordReader` only after you 
have exhausted the 4th. But now, you are getting it earlier, maybe even before 
0th. So the user has to make sure he consumes it after he has used up the 4th 
file if order is important for him/her, otherwise he/she can use it earlier 
also if order is not important. So, for example to count the total no. of rows, 
user does not need the original order.


---


[GitHub] carbondata issue #2865: [CARBONDATA-3002] Fix some spell error

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2865
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1300/



---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9351/



---


[GitHub] carbondata issue #2864: [CARBONDATA-3041] Optimize load minimum size strateg...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2864
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1089/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1298/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9350/



---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708281
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708277
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
+ for data loading.It will make sure that the node load the minimum 
amount of data there by 
+ reducing number of carbondata files. This property is useful if the 
size of the input data 
+ files are very small, like 1MB to 256MB. And This property can also 
be specified 
+ in the load option, the property value only int value is supported.
+
+ ```
+   TBLPROPERTIES('LOAD_MIN_SIZE_INMB'='256 MB')
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708250
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -123,6 +123,12 @@ private[sql] case class CarbonDescribeFormattedCommand(
 tblProps.get(CarbonCommonConstants.LONG_STRING_COLUMNS), ""))
 }
 
+// load min size info
+if 
(tblProps.containsKey(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB)) {
+  results ++= Seq(("Single node load min data size",
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708257
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
+  } catch {
+case e: NumberFormatException =>
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708260
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -748,4 +752,18 @@ object AlterTableUtil {
   false
 }
   }
+
+  private def validateLoadMinSizeProperties(carbonTable: CarbonTable,
+  propertiesMap: mutable.Map[String, String]): Unit = {
+// validate load min size property
+if 
(propertiesMap.get(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB).isDefined) {
+  // Cache level is not allowed for child tables and dataMaps
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708269
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708254
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708258
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")
+var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
--- End diff --

Has been modified based on the review


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228708265
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java
 ---
@@ -186,8 +186,7 @@
 optionsFinal.put("sort_scope", "local_sort");
 optionsFinal.put("sort_column_bounds", Maps.getOrDefault(options, 
"sort_column_bounds", ""));
 optionsFinal.put(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-CarbonCommonConstants.CARBON_LOAD_MIN_NODE_SIZE_INMB_DEFAULT));
+
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, ""));
--- End diff --

ok


---


[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

2018-10-27 Thread NamanRastogi
Github user NamanRastogi commented on the issue:

https://github.com/apache/carbondata/pull/2850
  
Yes, data coming from one file will always be in order. Please check the 
`split` method, it splits the list of CarbonRecordReader into multiple 
CarbonReader s.

Suppose there are 10 carbondata files, and the user wants to get 3 splits, 
so he will get a list like this:


---


[GitHub] carbondata issue #2865: [CARBONDATA-3002] Fix some spell error

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2865
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1088/



---


[jira] [Resolved] (CARBONDATA-3023) Alter add column issue with SORT_COLUMNS

2018-10-27 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-3023.
-
   Resolution: Fixed
Fix Version/s: 1.5.2

> Alter add column issue with SORT_COLUMNS
> 
>
> Key: CARBONDATA-3023
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3023
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: dhatchayani
>Assignee: dhatchayani
>Priority: Minor
> Fix For: 1.5.2
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2826: [CARBONDATA-3023] Alter add column issue with...

2018-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2826


---


[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...

2018-10-27 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2826
  
LGTM


---


[GitHub] carbondata pull request #2865: [CARBONDATA-3002] Fix some spell error

2018-10-27 Thread xubo245
GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/2865

[CARBONDATA-3002] Fix some spell error

Fix some spell error
Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
  No
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
No

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata 
CARBONDATA-3002_FixSpellError2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2865


commit 2d87aa91dab9ea1725ed5b9fce60d03c51cbca9a
Author: xubo245 
Date:   2018-10-27T09:47:11Z

[CARBONDATA-3002] Fix some spell error




---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9349/



---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1087/



---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1297/



---


[jira] [Assigned] (CARBONDATA-3038) Refactor dynamic configuration

2018-10-27 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 reassigned CARBONDATA-3038:
---

Assignee: xubo245

> Refactor dynamic configuration
> --
>
> Key: CARBONDATA-3038
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3038
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: xubo245
>Priority: Major
> Fix For: 1.5.1
>
>
> Refactor dynamic configuration for carbon:
> 1. Decide and collect all dynamic configurations which can be SET in 
> carbondata application like in beeline.
> 2. For every dynamic configuration, use an annotation to tag them. (re-use 
> the CarbonProperty annotation but change its name). This annotation should be 
> used for validation when user invoking SET command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1086/



---


[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

2018-10-27 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2850
  
@NamanRastogi Hi, customer required carbon provide the same order between 
using one thread to read and use multiple threads to read data. 




---


[GitHub] carbondata pull request #2850: [WIP] Added concurrent reading through SDK

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r228706363
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/ConcurrentSdkReaderTest.java
 ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+
+import junit.framework.TestCase;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOExceptionWithCause;
+import org.junit.*;
+
+/**
+ * multi-thread Test suite for {@link CarbonReader}
+ */
+public class ConcurrentSdkReaderTest extends TestCase {
+
+  private static final String dataDir = "./testReadFiles";
+
+  public void cleanTestData() {
+try {
+  FileUtils.deleteDirectory(new File(dataDir));
+} catch (Exception e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());
+}
+  }
+
+  private void writeTestData(long numRows, int tableBlockSize) {
+cleanTestData();
+
+Field[] fields = new Field[2];
+fields[0] = new Field("stringField", DataTypes.STRING);
+fields[1] = new Field("intField", DataTypes.INT);
+
+Map tableProperties = new HashMap<>();
+tableProperties.put("table_blocksize", 
Integer.toString(tableBlockSize));
+
+CarbonWriterBuilder builder =
+
CarbonWriter.builder().outputPath(dataDir).withTableProperties(tableProperties)
+.withCsvInput(new Schema(fields));
+
+try {
+  CarbonWriter writer = builder.build();
+
+  for (long i = 0; i < numRows; ++i) {
+writer.write(new String[] { "robot_" + i, String.valueOf(i) });
+  }
+  writer.close();
+} catch (Exception e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());
+}
+  }
+
+  @Test public void testReadParallely() throws IOException, 
InterruptedException {
+long numRows = 1000;
+int tableBlockSize = 10;
+short numThreads = 4;
+writeTestData(numRows, tableBlockSize);
+long count;
+
+CarbonReader reader = CarbonReader.builder(dataDir).build();
+try {
+  count = 0;
+  long start = System.currentTimeMillis();
+  while (reader.hasNext()) {
+reader.readNextRow();
+count += 1;
+  }
+  long end = System.currentTimeMillis();
+  System.out.println("[Sequential read] Time:" + (end - start));
+  Assert.assertEquals(numRows, count);
+} catch (Exception e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());
+} finally {
+  reader.close();
+}
+
+ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
+CarbonReader reader2 = CarbonReader.builder(dataDir).build();
+try {
+  List multipleReaders = reader2.split(numThreads);
+  List results = new ArrayList<>();
+  count = 0;
+  long start = System.currentTimeMillis();
+  for (CarbonReader reader_i : multipleReaders) {
+results.add(executorService.submit(new ReadLogic(reader_i)));
+  }
+  for (Future result_i : results) {
+count += (long) result_i.get();
+  }
+  long end = System.currentTimeMillis();
+  System.out.println("[Parallel read] Time:" + (end - start));
--- End diff --

Please add unit for it, such as ms


---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9347/



---


[GitHub] carbondata pull request #2850: [WIP] Added concurrent reading through SDK

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r228706276
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/ConcurrentSdkReaderTest.java
 ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+
+import junit.framework.TestCase;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOExceptionWithCause;
+import org.junit.*;
+
+/**
+ * multi-thread Test suite for {@link CarbonReader}
+ */
+public class ConcurrentSdkReaderTest extends TestCase {
+
+  private static final String dataDir = "./testReadFiles";
+
+  public void cleanTestData() {
--- End diff --

you can add @Before or @After got it. especially there are many test case


---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1295/



---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1085/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1084/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-27 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
@ajantha-bhat Please review again.


---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-27 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
retest this please


---


[GitHub] carbondata issue #2863: [WIP] Optimise decompressing while filling the vecto...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2863
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1294/



---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228705323
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -210,6 +212,288 @@ bool tryCatchException(JNIEnv *env) {
 }
 printf("\nfinished handle exception\n");
 }
+
+/**
+ * test write data to local disk
+ *
+ * @param env  jni env
+ * @param path file path
+ * @param argc argument counter
+ * @param argv argument vector
+ * @return true or throw exception
+ */
+bool testWriteData(JNIEnv *env, char *path, int argc, char *argv[]) {
+
+char *jsonSchema = 
"[{stringField:string},{shortField:short},{intField:int},{longField:long},{doubleField:double},{boolField:boolean},{dateField:date},{timeField:timestamp},{floatField:float},{arrayField:array}]";
+try {
+CarbonWriter writer;
+writer.builder(env);
+if (argc > 3) {
+writer.withHadoopConf("fs.s3a.access.key", argv[1]);
+writer.withHadoopConf("fs.s3a.secret.key", argv[2]);
+writer.withHadoopConf("fs.s3a.endpoint", argv[3]);
+}
+writer.outputPath(path);
+writer.withCsvInput(jsonSchema);
+writer.writtenBy("CSDK");
+writer.build();
+
+int rowNum = 10;
+int size = 10;
+long longValue = 0;
+double doubleValue = 0;
+float floatValue = 0;
+jclass objClass = env->FindClass("java/lang/String");
+for (int i = 0; i < rowNum; ++i) {
+jobjectArray arr = env->NewObjectArray(size, objClass, 0);
+char ctrInt[10];
+gcvt(i, 10, ctrInt);
+
+char a[15] = "robot";
+strcat(a, ctrInt);
+jobject stringField = env->NewStringUTF(a);
+env->SetObjectArrayElement(arr, 0, stringField);
+
+char ctrShort[10];
+gcvt(i % 1, 10, ctrShort);
+jobject shortField = env->NewStringUTF(ctrShort);
+env->SetObjectArrayElement(arr, 1, shortField);
+
+jobject intField = env->NewStringUTF(ctrInt);
+env->SetObjectArrayElement(arr, 2, intField);
+
+
+char ctrLong[10];
+gcvt(longValue, 10, ctrLong);
+longValue = longValue + 2;
+jobject longField = env->NewStringUTF(ctrLong);
+env->SetObjectArrayElement(arr, 3, longField);
+
+char ctrDouble[10];
+gcvt(doubleValue, 10, ctrDouble);
+doubleValue = doubleValue + 2;
+jobject doubleField = env->NewStringUTF(ctrDouble);
+env->SetObjectArrayElement(arr, 4, doubleField);
+
+jobject boolField = env->NewStringUTF("true");
+env->SetObjectArrayElement(arr, 5, boolField);
+
+jobject dateField = env->NewStringUTF(" 2019-03-02");
+env->SetObjectArrayElement(arr, 6, dateField);
+
+jobject timeField = env->NewStringUTF("2019-02-12 03:03:34");
+env->SetObjectArrayElement(arr, 7, timeField);
+
+char ctrFloat[10];
+gcvt(floatValue, 10, ctrFloat);
+floatValue = floatValue + 2;
+jobject floatField = env->NewStringUTF(ctrFloat);
+env->SetObjectArrayElement(arr, 8, floatField);
+
+jobject arrayField = 
env->NewStringUTF("Hello#World#From#Carbon");
+env->SetObjectArrayElement(arr, 9, arrayField);
+
+
+writer.write(arr);
+
+env->DeleteLocalRef(stringField);
+env->DeleteLocalRef(shortField);
+env->DeleteLocalRef(intField);
+env->DeleteLocalRef(longField);
+env->DeleteLocalRef(doubleField);
+env->DeleteLocalRef(floatField);
+env->DeleteLocalRef(dateField);
+env->DeleteLocalRef(timeField);
+env->DeleteLocalRef(boolField);
+env->DeleteLocalRef(arrayField);
+env->DeleteLocalRef(arr);
+}
+writer.close();
+
+CarbonReader carbonReader;
+carbonReader.builder(env, path);
+carbonReader.build();
+int i = 0;
+CarbonRow carbonRow(env);
+while (carbonReader.hasNext()) {
+jobject row = carbonReader.readNextRow();
+i++;
+carbonRow.setCarbonRow(row);
+printf("%s\t%d\t%ld\t", carbonRow.getString(0), 
carbonRow.getInt(1), carbonRow.getLong(2));
+jobjectArray array1 = carbonRow.getArray(3);
+jsize length = env->GetArrayLength(array1);
+int j = 0;

[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r228705225
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPageByteUtil.java
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page;
+
+/**
+ * Utility methods to converts to primitive types only column page data 
decode.
+ */
+public class ColumnPageByteUtil {
--- End diff --

I remember there is another `ByteUtil` in core module, can we use that 
instead of duplicating code


---


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228705294
  
--- Diff: store/CSDK/src/CarbonWriter.cpp ---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include "CarbonWriter.h"
+
+void CarbonWriter::builder(JNIEnv *env) {
+if (env == NULL) {
+throw std::runtime_error("JNIEnv parameter can't be NULL.");
+}
+jniEnv = env;
+carbonWriter = 
env->FindClass("org/apache/carbondata/sdk/file/CarbonWriter");
+if (carbonWriter == NULL) {
+throw std::runtime_error("Can't find the class in java: 
org/apache/carbondata/sdk/file/CarbonWriter");
+}
+jmethodID carbonWriterBuilderID = env->GetStaticMethodID(carbonWriter, 
"builder",
+"()Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (carbonWriterBuilderID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
carbonWriterBuilder");
+}
+carbonWriterBuilderObject = env->CallStaticObjectMethod(carbonWriter, 
carbonWriterBuilderID);
+}
+
+bool CarbonWriter::checkBuilder() {
+if (carbonWriterBuilderObject == NULL) {
+throw std::runtime_error("carbonWriterBuilder Object can't be 
NULL. Please call builder method first.");
+}
+}
+
+void CarbonWriter::outputPath(char *path) {
+if (path == NULL) {
+throw std::runtime_error("path parameter can't be NULL.");
+}
+checkBuilder();
+jclass carbonWriterBuilderClass = 
jniEnv->GetObjectClass(carbonWriterBuilderObject);
+jmethodID methodID = jniEnv->GetMethodID(carbonWriterBuilderClass, 
"outputPath",
+
"(Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (methodID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
outputPath");
+}
+jstring jPath = jniEnv->NewStringUTF(path);
+jvalue args[1];
+args[0].l = jPath;
+carbonWriterBuilderObject = 
jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args);
+}
+
+void CarbonWriter::withCsvInput(char *jsonSchema) {
+if (jsonSchema == NULL) {
+throw std::runtime_error("jsonSchema parameter can't be NULL.");
+}
+checkBuilder();
+jclass carbonWriterBuilderClass = 
jniEnv->GetObjectClass(carbonWriterBuilderObject);
+jmethodID methodID = jniEnv->GetMethodID(carbonWriterBuilderClass, 
"withCsvInput",
+
"(Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (methodID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
withCsvInput");
+}
+jstring jPath = jniEnv->NewStringUTF(jsonSchema);
+jvalue args[1];
+args[0].l = jPath;
+carbonWriterBuilderObject = 
jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args);
+if (jniEnv->ExceptionCheck()) {
+throw jniEnv->ExceptionOccurred();
+}
+};
+
+void CarbonWriter::withHadoopConf(char *key, char *value) {
+if (key == NULL) {
+throw std::runtime_error("key parameter can't be NULL.");
+}
+if (value == NULL) {
+throw std::runtime_error("value parameter can't be NULL.");
+}
+checkBuilder();
+jclass carbonWriterBuilderClass = 
jniEnv->GetObjectClass(carbonWriterBuilderObject);
+jmethodID methodID = jniEnv->GetMethodID(carbonWriterBuilderClass, 
"withHadoopConf",
+
"(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (methodID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
withHadoopConf");
+}
+jvalue args[2];
+args[0].l = jniEnv->NewStringUTF(key);
+args[1].l = jniEnv->NewStringUTF(value);
+carbonWriterBuilderObject = 
jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args);
  

[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-27 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2837#discussion_r228705283
  
--- Diff: store/CSDK/src/CarbonWriter.cpp ---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include "CarbonWriter.h"
+
+void CarbonWriter::builder(JNIEnv *env) {
+if (env == NULL) {
+throw std::runtime_error("JNIEnv parameter can't be NULL.");
+}
+jniEnv = env;
+carbonWriter = 
env->FindClass("org/apache/carbondata/sdk/file/CarbonWriter");
+if (carbonWriter == NULL) {
+throw std::runtime_error("Can't find the class in java: 
org/apache/carbondata/sdk/file/CarbonWriter");
+}
+jmethodID carbonWriterBuilderID = env->GetStaticMethodID(carbonWriter, 
"builder",
+"()Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (carbonWriterBuilderID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
carbonWriterBuilder");
+}
+carbonWriterBuilderObject = env->CallStaticObjectMethod(carbonWriter, 
carbonWriterBuilderID);
+}
+
+bool CarbonWriter::checkBuilder() {
+if (carbonWriterBuilderObject == NULL) {
+throw std::runtime_error("carbonWriterBuilder Object can't be 
NULL. Please call builder method first.");
+}
+}
+
+void CarbonWriter::outputPath(char *path) {
+if (path == NULL) {
+throw std::runtime_error("path parameter can't be NULL.");
+}
+checkBuilder();
--- End diff --

ok, done


---


[GitHub] carbondata issue #2863: [WIP] Optimise decompressing while filling the vecto...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2863
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9346/



---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1083/



---


[GitHub] carbondata issue #2863: [WIP] Optimise decompressing while filling the vecto...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2863
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1082/



---


[GitHub] carbondata issue #2805: [Documentation] Local dictionary Data which are not ...

2018-10-27 Thread sgururajshetty
Github user sgururajshetty commented on the issue:

https://github.com/apache/carbondata/pull/2805
  
@sraghunandan kindly review and help me to merge my changes


---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1292/



---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9344/



---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703510
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
--- End diff --

It seems that you get the load-min-size only from the table property but 
you claimed that carbon also support specifying it through loadOption.

The expected procedure is:
1. get the loadMinSize from LoadOption, if it is zero, goto step2; 
otherwise goto step4; 
2. get it from TableProperty, if it is zero, go to step 3, otherwise goto 
step4;
3. use other strategy
4. use NODE_MIN_SIZE_FIRST;

Have you handled this?


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228702976
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
+  } catch {
+case e: NumberFormatException =>
--- End diff --

once you update the check, remember to update this error message as well


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703382
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
--- End diff --

You can optimize this description like this:

```
This property indicates the minimum input data size per node for data 
loading.
By default it is not enabled. Setting a non-zero integer value will enable 
this feature.
This property is useful if you have a large cluster and only want a small 
portion of the nodes to process data loading.
For example, if you have a cluster with 10 nodes and the input data is 
about 1GB. Without this property, each node will process about 100MB input data 
and result in at least 10 data files. With this property configured with 512 
will, only 2 nodes will be chosen to process the input data, each with about 
512MB input and result in about 2 or 4 files based on the compress ratio.
Moreover, this property can also be specified in the load option.
Notice that once you enable this feature, for load balance, carbondata will 
ignore the data locality while assigning input data to nodes, this will cause 
more network traffic.
```


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228702949
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -833,4 +833,26 @@ object CommonUtil {
   })
 }
   }
+
+  /**
+   * This method will validate single node minimum load data volume of 
table specified by the user
+   *
+   * @param tableProperties table property specified by user
+   * @param propertyName property name
+   */
+  def validateLoadMinSize(tableProperties: Map[String, String], 
propertyName: String): Unit = {
+var size: Integer = 0
+if (tableProperties.get(propertyName).isDefined) {
+  val loadSizeStr: String =
+parsePropertyValueStringInMB(tableProperties(propertyName))
+  try {
+size = Integer.parseInt(loadSizeStr)
--- End diff --

what about the checking for range bounds, can this be negative or zero?
I think in exception scenario, you can set this value to 0, so that later 
you can use this as a flag (whether the value is zero) to determine whether to 
enable size-based-block-assignment.


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703171
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java
 ---
@@ -186,8 +186,7 @@
 optionsFinal.put("sort_scope", "local_sort");
 optionsFinal.put("sort_column_bounds", Maps.getOrDefault(options, 
"sort_column_bounds", ""));
 optionsFinal.put(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
-CarbonCommonConstants.CARBON_LOAD_MIN_NODE_SIZE_INMB_DEFAULT));
+
Maps.getOrDefault(options,CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, ""));
--- End diff --

need a space after "options,"


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703158
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ---
@@ -748,4 +752,18 @@ object AlterTableUtil {
   false
 }
   }
+
+  private def validateLoadMinSizeProperties(carbonTable: CarbonTable,
+  propertiesMap: mutable.Map[String, String]): Unit = {
+// validate load min size property
+if 
(propertiesMap.get(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB).isDefined) {
+  // Cache level is not allowed for child tables and dataMaps
--- End diff --

'Cache level'?


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703135
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -123,6 +123,12 @@ private[sql] case class CarbonDescribeFormattedCommand(
 tblProps.get(CarbonCommonConstants.LONG_STRING_COLUMNS), ""))
 }
 
+// load min size info
+if 
(tblProps.containsKey(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB)) {
+  results ++= Seq(("Single node load min data size",
--- End diff --

You can optimize this info to 'Minimum input data size per node for data 
loading'


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703110
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
   .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
 val skewedDataOptimization = CarbonProperties.getInstance()
   .isLoadSkewedDataOptimizationEnabled()
-val loadMinSizeOptimization = CarbonProperties.getInstance()
-  .isLoadMinSizeOptimizationEnabled()
 // get user ddl input the node loads the smallest amount of data
-val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")
+var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
--- End diff --

there is no need to add another variable `expectedMinSizePerNode`. In line 
1190, we can just use `loadMinSize` to determine which branch should we go: if 
it is zero, use 'BLOCK_SIZE_FIRST', otherwise, use 'NODE_MIN_SIZE_FIRST'.


---


[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

2018-10-27 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2864#discussion_r228703406
  
--- Diff: docs/ddl-of-carbondata.md ---
@@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
  be later viewed in table description for reference.
 
  ```
-   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+   TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
+ ```
+ 
+   - # Load minimum data size
+ This property determines whether to enable node minumun input data 
size allocation strategy 
+ for data loading.It will make sure that the node load the minimum 
amount of data there by 
+ reducing number of carbondata files. This property is useful if the 
size of the input data 
+ files are very small, like 1MB to 256MB. And This property can also 
be specified 
+ in the load option, the property value only int value is supported.
+
+ ```
+   TBLPROPERTIES('LOAD_MIN_SIZE_INMB'='256 MB')
--- End diff --

I think we can remove this and only support '256' since the property name 
already contains 'INMB', this will make the code simple.


---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1291/



---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9343/



---


[GitHub] carbondata issue #2863: [WIP] Optimise decompressing while filling the vecto...

2018-10-27 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2863
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1081/



---