[GitHub] carbondata issue #2712: [HOTFIX] Fix streaming CI issue

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2712
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8483/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/411/



---


[GitHub] carbondata issue #2712: [HOTFIX] Fix streaming CI issue

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2712
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/413/



---


[GitHub] carbondata issue #2711: [CARBONDATA-2929][DataMap] Add block skipped info fo...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2711
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8482/



---


[GitHub] carbondata issue #2708: [CARBONDATA-2886] Select Filter Compatibility Fixed

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2708
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/246/



---


[GitHub] carbondata issue #2711: [CARBONDATA-2929][DataMap] Add block skipped info fo...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2711
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/412/



---


[GitHub] carbondata issue #2702: [CARBONDATA-2924] Fix parsing issue for map as a nes...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2702
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/245/



---


[GitHub] carbondata issue #2708: [CARBONDATA-2886] Select Filter Compatibility Fixed

2018-09-11 Thread brijoobopanna
Github user brijoobopanna commented on the issue:

https://github.com/apache/carbondata/pull/2708
  
retest this please



---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/410/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8481/



---


[GitHub] carbondata issue #2712: [HOTFIX] Fix streaming CI issue

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2712
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/244/



---


[GitHub] carbondata issue #2711: [CARBONDATA-2929][DataMap] Add block skipped info fo...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2711
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/243/



---


[GitHub] carbondata pull request #2712: [HOTFIX] Fix streaming CI issue

2018-09-11 Thread QiangCai
GitHub user QiangCai opened a pull request:

https://github.com/apache/carbondata/pull/2712

[HOTFIX] Fix streaming CI issue

fix streaming ci issue

- [x] Any interfaces changed?
 no
 - [x] Any backward compatibility impacted?
 no
 - [x] Document update required?
 no
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
small changes


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/carbondata fix_streaming_issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2712


commit 31ba78cb30279e0ab7e37f28ab078cf86bcee4c9
Author: QiangCai 
Date:   2018-09-12T04:00:27Z

fix streaming ci issue




---


[GitHub] carbondata pull request #2711: [CARBONDATA-2929][DataMap] Add block skipped ...

2018-09-11 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2711

[CARBONDATA-2929][DataMap] Add block skipped info for explain command


This pr will add block skipped info by counting distinct file path from hit 
blocklet. It shows like below:
```
|== CarbonData Profiler ==
Table Scan on test
 - total: 125 blocks, 250 blocklets
 - filter: (l_partkey <> null and l_partkey = 1006)
 - pruned by Main DataMap
- skipped: 119 blocks, 238 blocklets
 - pruned by CG DataMap
- name: dm
- provider: bloomfilter
- skipped: 6 blocks, 12 blocklets
```

```
|== CarbonData Profiler ==
Table Scan on test
 - total: 125 blocks, 250 blocklets
 - filter: TEXT_MATCH('l_shipmode:AIR')
 - pruned by Main DataMap
- skipped: 0 blocks, 0 blocklets
 - pruned by FG DataMap
- name: dm
- provider: lucene
- skipped: 12 blocks, 80 blocklets
```


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata explain_block_skip

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2711


commit 0828b4d3f366b02a3f9db89e862fc9bc0b89
Author: Manhua 
Date:   2018-09-12T03:29:46Z

add block skip info




---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
LGTM. spark 2.3 CI has problem currently, we are fixing it


---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8480/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/242/



---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/241/



---


[GitHub] carbondata issue #2709: [HOTFIX] Removed scala dependency from carbon core m...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2709
  
@jackylk I got preaggregate related failure in spark2.3 

http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8479/

it seems all the failed testcase all contains sql("reset")



---


[jira] [Updated] (CARBONDATA-2929) Add block skipped info for explain command

2018-09-11 Thread jiangmanhua (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiangmanhua updated CARBONDATA-2929:

Summary: Add block skipped info for explain command  (was: add block 
skipped info for explain command)

> Add block skipped info for explain command
> --
>
> Key: CARBONDATA-2929
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2929
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: jiangmanhua
>Assignee: jiangmanhua
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2929) add block skipped info for explain command

2018-09-11 Thread jiangmanhua (JIRA)
jiangmanhua created CARBONDATA-2929:
---

 Summary: add block skipped info for explain command
 Key: CARBONDATA-2929
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2929
 Project: CarbonData
  Issue Type: Improvement
Reporter: jiangmanhua
Assignee: jiangmanhua






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
retest this please


---


[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2706
  
@ajantha-bhat 
Hi, I think the main problem may be that you set the 'rowbuffer' as static 
which should not be shared among different data loadings.

Besides, the judgement for increasing rowBuffer size per row per column may 
decrease data loading performance.

As a result, I'd like to implement this in an easier way.

We can add a table propery or load option for the size of row buffer. Just 
keep the previous row-buffer related code as it is. All you need is to change 
the initial size of the rowbuffer based on the table property or load option.

@kumarvishal09 @ravipesala How do you think?


---


[jira] [Updated] (CARBONDATA-2928) query failed when doing merge index during load

2018-09-11 Thread ocean (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ocean updated CARBONDATA-2928:
--
Description: 
In carbondata version 1.4.1, carbonindex file be merged in every load. But when 
query through thriftserver(about 10QPS), if merge index is in progress, An 
error will occurs.

18/09/12 11:18:25 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
 Exchange SinglePartition
 +- *HashAggregate(keys=[], functions=[partial_count(1)], 
output=[count#1692258L|#1692258L])
 +- *Project
 +- *FileScan carbondata default.ae_event_cb_40e_std[] PushedFilters: 
[IsNotNull(eventid), IsNotNull(productid), IsNotNull(starttime_day), 
EqualTo(productid,534), Equa...

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:115)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:252)
 at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at 
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:228)
 at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
 at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
 at org.apache.spark.sql.Dataset.collect(Dataset.scala:2387)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Problem in loading segment blocks.
 at 
org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:184)
 at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:144)
 at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:93)
 at 
org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53)
 at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:442)
 at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:378)
 at 

[jira] [Created] (CARBONDATA-2928) query failed when doing merge index during load

2018-09-11 Thread ocean (JIRA)
ocean created CARBONDATA-2928:
-

 Summary: query failed when doing merge index during load
 Key: CARBONDATA-2928
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2928
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.4.1
Reporter: ocean
 Fix For: NONE


In carbondata version 1.4.1, carbonindex file be merged in every load. But when 
query through thriftserver, if merge index is in progress, An error will occurs.

18/09/12 11:18:25 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], 
output=[count#1692258L])
 +- *Project
 +- *FileScan carbondata default.ae_event_cb_40e_std[] PushedFilters: 
[IsNotNull(eventid), IsNotNull(productid), IsNotNull(starttime_day), 
EqualTo(productid,534), Equa...

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:115)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:252)
 at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
 at 
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:228)
 at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
 at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
 at org.apache.spark.sql.Dataset.collect(Dataset.scala:2387)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Problem in loading segment blocks.
 at 
org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:184)
 at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:144)
 at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:93)
 at 
org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapperImpl.prune(DataMapExprWrapperImpl.java:53)
 at 

[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885804
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
@@ -240,11 +249,11 @@ public void addRow(Object[] row) throws 
CarbonSortKeyAndGroupByException {
   throw new CarbonSortKeyAndGroupByException(ex);
 }
 rowPage.addRow(row, rowBuffer.get());
-  } catch (Exception e) {
-LOGGER.error(
-"exception occurred while trying to acquire a semaphore lock: 
" + e.getMessage());
-throw new CarbonSortKeyAndGroupByException(e);
   }
+} catch (Exception e) {
+  LOGGER
--- End diff --

bad indent. we can move the msg to next line and keep method call in this 
line


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216884982
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -570,23 +589,31 @@ public int 
writeRawRowAsIntermediateSortTempRowToUnsafeMemory(Object[] row,
   private void packNoSortFieldsToBytes(Object[] row, ByteBuffer rowBuffer) 
{
 // convert dict & no-sort
 for (int idx = 0; idx < this.dictNoSortDimCnt; idx++) {
+  // cannot exceed default 2MB, hence no need to call ensureArraySize
   rowBuffer.putInt((int) row[this.dictNoSortDimIdx[idx]]);
 }
 // convert no-dict & no-sort
 for (int idx = 0; idx < this.noDictNoSortDimCnt; idx++) {
   byte[] bytes = (byte[]) row[this.noDictNoSortDimIdx[idx]];
+  // cannot exceed default 2MB, hence no need to call ensureArraySize
--- End diff --

for one column, it may not exceed 2MB, what if we lots of no-sort-no-dict 
columns?


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216884722
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -559,7 +572,13 @@ public int 
writeRawRowAsIntermediateSortTempRowToUnsafeMemory(Object[] row,
 return size;
   }
 
-
+  private void validateUnsafeMemoryBlockSizeLimit(long 
unsafeRemainingLength, int size)
--- End diff --

please optimize the parameter name of 'size' for better reading, it seems 
that it represents the requestedSize


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885444
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeCarbonRowPage.java
 ---
@@ -59,12 +60,11 @@ public UnsafeCarbonRowPage(TableFieldStat 
tableFieldStat, MemoryBlock memoryBloc
 this.taskId = taskId;
 buffer = new IntPointerBuffer(this.taskId);
 this.dataBlock = memoryBlock;
-// TODO Only using 98% of space for safe side.May be we can have 
different logic.
-sizeToBeUsed = dataBlock.size() - (dataBlock.size() * 5) / 100;
+sizeToBeUsed = dataBlock.size();
--- End diff --

Is the old comment outdated? Have you ensured the 'safe side' it mentioned?


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885323
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -598,26 +625,53 @@ private void packNoSortFieldsToBytes(Object[] row, 
ByteBuffer rowBuffer) {
   tmpValue = row[this.measureIdx[idx]];
   tmpDataType = this.dataTypes[idx];
   if (null == tmpValue) {
+// can exceed default 2MB, hence need to call ensureArraySize
+rowBuffer = UnsafeSortDataRows
+.ensureArraySize(1);
 rowBuffer.put((byte) 0);
 continue;
   }
+  // can exceed default 2MB, hence need to call ensureArraySize
+  rowBuffer = UnsafeSortDataRows
+  .ensureArraySize(1);
--- End diff --

bad indent, can be moved to previous line
The same with line#642, line#647


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216884374
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/memory/UnsafeMemoryManager.java 
---
@@ -200,7 +200,7 @@ public static MemoryBlock allocateMemoryWithRetry(long 
taskId, long size)
 }
 if (baseBlock == null) {
   INSTANCE.printCurrentMemoryUsage();
-  throw new MemoryException("Not enough memory");
+  throw new MemoryException("Not enough memory, increase 
carbon.unsafe.working.memory.in.mb");
--- End diff --

I think you can optimize the error message to 
`Not enough unsafe working memory (total: , available: , request: )`


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885250
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -598,26 +625,53 @@ private void packNoSortFieldsToBytes(Object[] row, 
ByteBuffer rowBuffer) {
   tmpValue = row[this.measureIdx[idx]];
   tmpDataType = this.dataTypes[idx];
   if (null == tmpValue) {
+// can exceed default 2MB, hence need to call ensureArraySize
+rowBuffer = UnsafeSortDataRows
+.ensureArraySize(1);
--- End diff --

bad indent, can be moved to previous line


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216886119
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
@@ -326,6 +335,19 @@ private void startFileBasedMerge() throws 
InterruptedException {
 dataSorterAndWriterExecutorService.awaitTermination(2, TimeUnit.DAYS);
   }
 
+  public static ByteBuffer ensureArraySize(int requestSize) {
--- End diff --

If we increase the rowbuffer runtime, is there a way to decrease it? Or if 
there is no need to do so, how long will this rowbuffer last?


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885202
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -570,23 +589,31 @@ public int 
writeRawRowAsIntermediateSortTempRowToUnsafeMemory(Object[] row,
   private void packNoSortFieldsToBytes(Object[] row, ByteBuffer rowBuffer) 
{
 // convert dict & no-sort
 for (int idx = 0; idx < this.dictNoSortDimCnt; idx++) {
+  // cannot exceed default 2MB, hence no need to call ensureArraySize
   rowBuffer.putInt((int) row[this.dictNoSortDimIdx[idx]]);
 }
 // convert no-dict & no-sort
 for (int idx = 0; idx < this.noDictNoSortDimCnt; idx++) {
   byte[] bytes = (byte[]) row[this.noDictNoSortDimIdx[idx]];
+  // cannot exceed default 2MB, hence no need to call ensureArraySize
   rowBuffer.putShort((short) bytes.length);
   rowBuffer.put(bytes);
 }
 // convert varchar dims
 for (int idx = 0; idx < this.varcharDimCnt; idx++) {
   byte[] bytes = (byte[]) row[this.varcharDimIdx[idx]];
+  // can exceed default 2MB, hence need to call ensureArraySize
+  rowBuffer = UnsafeSortDataRows
--- End diff --

Should we call this method per row per column?
Since in most scenarios, 2MB per row is enough, so will the method calling 
here cause performance decrease?


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885637
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
@@ -72,7 +72,7 @@
 
   private SortParameters parameters;
   private TableFieldStat tableFieldStat;
-  private ThreadLocal rowBuffer;
+  private static ThreadLocal rowBuffer;
--- End diff --

I think the 'static' here may cause problem for concurrent loading. Each 
loading should their own rowBuffer.


---


[GitHub] carbondata pull request #2706: [CARBONDATA-2927] multiple issue fixes for va...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2706#discussion_r216885885
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
@@ -326,6 +335,19 @@ private void startFileBasedMerge() throws 
InterruptedException {
 dataSorterAndWriterExecutorService.awaitTermination(2, TimeUnit.DAYS);
   }
 
+  public static ByteBuffer ensureArraySize(int requestSize) {
--- End diff --

please give a comment that this method is used to increase the rowbuffer 
during loading.


---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8479/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/409/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/240/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@ravipesala @jackylk 
I add an optional 'compressor_name' aside the 'compression_codec'. During 
processing, I use the compressor_name and set compression_codec to a deprecated 
value.

Also I add an interface to register customize compressor and add a test for 
it.

For now, all the review comments are resolved.


---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/408/



---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8478/



---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/407/



---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/239/



---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/238/



---


[GitHub] carbondata issue #2695: [CARBONDATA-2919] Support ingest from Kafka in Strea...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2695
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8477/



---


[GitHub] carbondata issue #2709: [HOTFIX] Removed scala dependency from carbon core m...

2018-09-11 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2709
  
spark 2.3 has random failure? @ravipesala @QiangCai 


---


[GitHub] carbondata pull request #2695: [CARBONDATA-2919] Support ingest from Kafka i...

2018-09-11 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2695#discussion_r216736902
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/StreamSQLExample.scala
 ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.carbondata.examples.util.ExampleUtils
+
+// scalastyle:off println
+object StreamSQLExample {
--- End diff --

Now I changed this Example to use socket stream source.


---


[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8476/



---


[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/406/



---


[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2706
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/237/



---


[GitHub] carbondata issue #2706: [CARBONDATA-2927] multiple issue fixes for varchar c...

2018-09-11 Thread ajantha-bhat
Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2706
  
@kumarvishal09 @ravipesala : please do in-depth review for this PR. impact 
is more. 


---


[jira] [Created] (CARBONDATA-2927) Multiple issue fixes for varchar column and complex columns that grows more than 2MB

2018-09-11 Thread Ajantha Bhat (JIRA)
Ajantha Bhat created CARBONDATA-2927:


 Summary: Multiple issue fixes for varchar column and complex 
columns that grows more than 2MB
 Key: CARBONDATA-2927
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2927
 Project: CarbonData
  Issue Type: Bug
Reporter: Ajantha Bhat
Assignee: Ajantha Bhat


*Fixed:*
 *1. varchar data length is more than 2MB, buffer overflow exception (thread 
local row buffer)*

*root* casue*: thread* loaclbuffer *was hardcoded with 2MB.* 

*solution: grow dynamically based on the row size.*


 *2. read data from carbon file having one row of varchar data with 150 MB 
length is very slow.*

*root casue:  At UnsafeDMStore, ensure memory is just incresing by 8KB each 
time and lot of time malloc and free happens before reaching 150MB. hence very 
slow performance.*

*solution: directly check and allocate the required size.*


 *3. Jvm crash when data size is more than 128 MB in unsafe sort step.*

*root cause: unsafeCarbonRowPage is of 128MB, so if data is more than 128MB  
for one row, we access block beyond allocated, leading to JVM crash.* 

*solution: validate the size before access and prompt user to increase unsafe 
memory. (by carbon property)*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/236/



---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/405/



---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8475/



---


[GitHub] carbondata issue #2683: [CARBONDATA-2916] Add CarbonCli tool for data summar...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2683
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/235/



---


[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...

2018-09-11 Thread shardul-cr7
Github user shardul-cr7 closed the pull request at:

https://github.com/apache/carbondata/pull/2710


---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/403/



---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8473/



---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/234/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8472/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/402/



---


[jira] [Assigned] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit

2018-09-11 Thread Brijoo Bopanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brijoo Bopanna reassigned CARBONDATA-2877:
--

Assignee: Brijoo Bopanna  (was: kumar vishal)

> CarbonDataWriterException when loading data to carbon table with large number 
> of rows/columns from Spark-Submit
> ---
>
> Key: CARBONDATA-2877
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2877
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.4.1
> Environment: Spark 2.1
>Reporter: Chetan Bhat
>Assignee: Brijoo Bopanna
>Priority: Major
>
> Steps :
> from Spark-Submit. User creates a table with large number of columns(around 
> 100) and tries to load around 3 lakh records to the table.
> Spark-submit command - spark-submit --master yarn --num-executors 3 
> --executor-memory 75g --driver-memory 10g --executor-cores 12 --class
> Actual Issue : Data loading fails with CarbonDataWriterException.
> Executor yarn UI log-
> org.apache.spark.util.TaskCompletionListenerException: 
> org.apache.carbondata.core.datastore.exception.CarbonDataWriterException
> Previous exception in task: Error while initializing data handler : 
>  
> org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
>  
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
>  org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  org.apache.spark.scheduler.Task.run(Task.scala:99)
>  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
>  at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : The dataloading should be successful from Spark-submit similar to 
> that in Beeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2710: [2875]two different threads overwriting the same car...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2710
  
Can one of the admins verify this patch?


---


[GitHub] carbondata issue #2710: [2875]two different threads overwriting the same car...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2710
  
Can one of the admins verify this patch?


---


[GitHub] carbondata pull request #2710: [2875]two different threads overwriting the s...

2018-09-11 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2710

[2875]two different threads overwriting the same carbondatafile 

Problem : Two different threads are overwriting the same carbondata file 
during creation of external table.

Solution: Chances of two threads concurrently loading same carbondatafile 
is reduced by changing the timestamp attached in .carbondata file from 
millisecond to nanosecond.So chances of collision of different threads having 
the same file name is reduced.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
 Done Manually
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2710


commit 61520a3d0bacfbcbed5a2b5ac300f08cf9b36bb4
Author: shardul-cr7 
Date:   2018-09-11T12:51:09Z

chances of two different threads overwriting the same carbondatafile is 
reduced




---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/233/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@ravipesala yeah, that's what I'm doing now. please check the commit: 
https://github.com/apache/carbondata/pull/2628/commits/d21fd869d442f535e4704dc06d9edc2f01984cb0



---


[GitHub] carbondata issue #2705: [CARBONDATA-2926] fixed ArrayIndexOutOfBoundExceptio...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2705
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/401/



---


[GitHub] carbondata issue #2705: [CARBONDATA-2926] fixed ArrayIndexOutOfBoundExceptio...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2705
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8471/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8470/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@xuchuanyin yes, we cannot get rid of enum. But add another optional field 
in `ChunkCompressionMeta` to take interface name. Just ignore the enum and read 
only interface name.
@jackylk Please give your opinion on this.


---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/400/



---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8469/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@ravipesala fine, I'll rework on this.
The bad news is that the Enum 'CompressionCodec' in thrift is 'required', 
so even we do use it, we cannot get rid of it.
The good news is that for legacy store, it is always snappy which makes it 
easier if we bypass this Enum.


---


[jira] [Resolved] (CARBONDATA-2909) Support Multiple User reading and writing through SDK.

2018-09-11 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-2909.
-
   Resolution: Fixed
 Assignee: Kunal Kapoor
Fix Version/s: 1.5.0

> Support Multiple User reading and writing through SDK.
> --
>
> Key: CARBONDATA-2909
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2909
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Major
> Fix For: 1.5.0
>
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2678: [CARBONDATA-2909] Multi user support for SDK ...

2018-09-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2678


---


[GitHub] carbondata issue #2678: [CARBONDATA-2909] Multi user support for SDK on S3

2018-09-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
LGTM


---


[GitHub] carbondata issue #2705: [CARBONDATA-2926] fixed ArrayIndexOutOfBoundExceptio...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2705
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/232/



---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/399/



---


[GitHub] carbondata issue #2709: [HOTFIX] Removed scala dependency from carbon core m...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2709
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/398/



---


[GitHub] carbondata issue #2709: [HOTFIX] Removed scala dependency from carbon core m...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2709
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8468/



---


[GitHub] carbondata issue #2670: [CARBONDATA-2917] Support binary datatype

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2670
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/397/



---


[GitHub] carbondata pull request #2705: [CARBONDATA-2926] fixed ArrayIndexOutOfBoundE...

2018-09-11 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2705#discussion_r216634097
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java ---
@@ -36,6 +37,14 @@
   private DimensionSpec[] dimensionSpec;
   private MeasureSpec[] measureSpec;
 
+  // Many places we might have to access no-dictionary column spec.
+  // but no-dictionary column spec are not always in below order like,
+  // dictionary + no dictionary + complex + measure
+  // when sort_columns are empty, no columns are selected for sorting.
+  // so, spec will not be in above order.
+  // Hence NoDictionaryDimensionSpec will be useful and it will be subset 
of dimensionSpec.
+  private List NoDictionaryDimensionSpec;
--- End diff --

done.


---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/231/



---


[GitHub] carbondata issue #2703: [CARBONDATA-2925]Wrong data displayed for spark file...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2703
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/230/



---


[GitHub] carbondata issue #2670: [CARBONDATA-2917] Support binary datatype

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2670
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8467/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
retest this please


---


[GitHub] carbondata issue #2678: [CARBONDATA-2909] Multi user support for SDK on S3

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8466/



---


[GitHub] carbondata issue #2678: [CARBONDATA-2909] Multi user support for SDK on S3

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2678
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/396/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8465/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@ravipesala As for the implementation, is **duplicate the info and add 
another description for it by the side of current enum** OK? Or do you have 
another suggestion to implement this?


---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/395/



---


[GitHub] carbondata issue #2709: [HOTFIX] Removed scala dependency from carbon core m...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2709
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/229/



---


[GitHub] carbondata issue #2628: [CARBONDATA-2851][CARBONDATA-2852] Support zstd as c...

2018-09-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2628
  
@xuchuanyin I feel it is very necessary to save compressor name in thrift 
instead of enum. It will not be a good idea to change thrift for every 
compression support and also it limits the user to give their custom compressor 
interface while creating the table.


---


[GitHub] carbondata issue #2704: [HOTFIX] Old stores cannot read with new table infer...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2704
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8464/



---


[GitHub] carbondata pull request #2705: [CARBONDATA-2926] fixed ArrayIndexOutOfBoundE...

2018-09-11 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2705#discussion_r216620144
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java ---
@@ -36,6 +37,14 @@
   private DimensionSpec[] dimensionSpec;
   private MeasureSpec[] measureSpec;
 
+  // Many places we might have to access no-dictionary column spec.
+  // but no-dictionary column spec are not always in below order like,
+  // dictionary + no dictionary + complex + measure
+  // when sort_columns are empty, no columns are selected for sorting.
+  // so, spec will not be in above order.
+  // Hence NoDictionaryDimensionSpec will be useful and it will be subset 
of dimensionSpec.
+  private List NoDictionaryDimensionSpec;
--- End diff --

Better change name to `noDictionaryDimensionSpec`


---


[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/393/



---


[GitHub] carbondata issue #2704: [HOTFIX] Old stores cannot read with new table infer...

2018-09-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2704
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/394/



---


  1   2   >