date:20190109

[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2466/



---

[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10504/



---

[jira] [Created] (CARBONDATA-3239) Throwing ArrayIndexOutOfBoundsException in DataSkewRangePartitioner

2019-01-09 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-3239:


 Summary: Throwing ArrayIndexOutOfBoundsException in 
DataSkewRangePartitioner
 Key: CARBONDATA-3239
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3239
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Reporter: QiangCai


2019-01-10 15:31:21 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
failed for table carbon_range_column4
java.lang.ArrayIndexOutOfBoundsException: 1
 at 
org.apache.spark.DataSkewRangePartitioner$$anonfun$initialize$1.apply$mcVI$sp(DataSkewRangePartitioner.scala:223)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
 at 
org.apache.spark.DataSkewRangePartitioner.initialize(DataSkewRangePartitioner.scala:222)
 at 
org.apache.spark.DataSkewRangePartitioner.getPartition(DataSkewRangePartitioner.scala:234)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2247/



---

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2019-01-09 Thread BJangir

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r246647432
  
--- Diff: docs/csdk-guide.md ---
@@ -29,6 +29,32 @@ code and without CarbonSession.
 
 In the carbon jars package, there exist a carbondata-sdk.jar, 
 including SDK reader for C++ SDK.
+
+##Compile/Build CSDK
+CSDK supports cmake based compilation and has dependency list in 
CMakeLists.txt.
+ Prerequisites
+GCC >=4.8.5
+Cmake >3.13
+Make >=4.1
+
+Steps 
+1. Go to CSDK folder(/opt/.../CSDK/) 
+2. Create build folder . (/opt/.../CSDK/build) 
+3. Run Command from build folder `cmake ../`
+4. `make`
+
+Test Cases are written in  
[main.cpp](https://github.com/apache/carbondata/blob/master/store/CSDK/test/main.cpp)
 with GoogleTest C++ Framework.
+if GoogleTest LIBRARY is not added then compilation of example code will 
fail. Please follow below steps to solve the same
+1. Remove test/main.cpp from SOURCE_FILES of CMakeLists.txt and 
compile/build again.
+2. Follow below Steps to configure GoogleTest Framework
+* Download googleTest release (CI is complied with 1.8) 
https://github.com/google/googletest/releases
+* Extract to folder like /opt/googletest/googletest-release-1.8.1/ and 
create build folder inside this  like 
/opt/googletest/googletest-release-1.8.1/googletest/build)
--- End diff --

updated , please review again.


---

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2019-01-09 Thread BJangir

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r246645671
  
--- Diff: docs/csdk-guide.md ---
@@ -40,6 +66,7 @@ release the memory and destroy JVM.
 
 C++ SDK support read batch row. User can set batch by using withBatch(int 
batch) before build, and read batch by using readNextBatchRow().
 
+
--- End diff --

OK


---

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2019-01-09 Thread BJangir

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r246644993
  
--- Diff: docs/csdk-guide.md ---
@@ -29,6 +29,32 @@ code and without CarbonSession.
 
 In the carbon jars package, there exist a carbondata-sdk.jar, 
 including SDK reader for C++ SDK.
+
+##Compile/Build CSDK
--- End diff --

OK


---

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2019-01-09 Thread BJangir

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r246644367
  
--- Diff: docs/csdk-guide.md ---
@@ -29,6 +29,32 @@ code and without CarbonSession.
 
 In the carbon jars package, there exist a carbondata-sdk.jar, 
 including SDK reader for C++ SDK.
+
+# Compile/Build CSDK
--- End diff --

OK.


---

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2019-01-09 Thread BJangir

Github user BJangir commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r246643403
  
--- Diff: docs/csdk-guide.md ---
@@ -29,6 +29,32 @@ code and without CarbonSession.
 
 In the carbon jars package, there exist a carbondata-sdk.jar, 
 including SDK reader for C++ SDK.
+
+##Compile/Build CSDK
+CSDK supports cmake based compilation and has dependency list in 
CMakeLists.txt.
+ Prerequisites
+GCC >=4.8.5
+Cmake >3.13
+Make >=4.1
+
+Steps 
+1. Go to CSDK folder(/opt/.../CSDK/) 
+2. Create build folder . (/opt/.../CSDK/build) 
+3. Run Command from build folder `cmake ../`
+4. `make`
--- End diff --

It is same like before not change.  After `make` command you will get the 
executable program (named CSDK). so directly execute the same .   ./CSDK . if 
Result to be redirected to xml then use command like  ./CSDK 
--gtest_output="xml:${REPORT_PATH}/CSDK_Report.xml".


---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
@qiuchenjian : Test cases are there, But problem comes only in the cluster. 
Not in local environment due to jar dependency. 


---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2465/



---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10503/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10502/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2464/



---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2246/



---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread qiuchenjian

Github user qiuchenjian commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
Is there test case to test this scene (use 'exclude filter' in presto 
carbon )ï¼ if notï¼ better to add test case, so that other's changes will not 
affect this feature.



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2245/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
retest this please


---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2463/



---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
@manishnalla1994 

> Solution:
Check if any other RDD is sharing the same task context. If so, don't the 
clear the resource at that time, the other RDD which shared the context should 
clear the memory once after the task is finished.

It seems in #2591, for data source table scenario, if the query and insert 
procedures also share the same context, it can also benefit from the 
implementation in #2591 without any changes. Right?


---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
We do not need to expose this threshold to the user. Instead, we can judge 
ourselves in carbondata.

Step1. We can get the size of non-dictionary-encoded page (say M) and the 
size of dictionary-encoded page (say N). 
Step2: if M/N >=1 (or M/N >= 0.9), we can fallback automatically.

Parquet (maybe ORC) behaves like this.



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10501/



---

[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...

2019-01-09 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3054#discussion_r246427391
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala
 ---
@@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils
 /**
  * configure alluxio:
  * 1.start alluxio
- * 2.upload the jar :"/alluxio_path/core/client/target/
- * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar"
- * 3.Get more detail 
at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
+ * 2.Get more detail at: 
https://www.alluxio.org/docs/1.8/en/compute/Spark.html
  */
-
 object AlluxioExample {
-  def main(args: Array[String]) {
-val spark = ExampleUtils.createCarbonSession("AlluxioExample")
-exampleBody(spark)
-spark.close()
+  def main (args: Array[String]) {
+val carbon = ExampleUtils.createCarbonSession("AlluxioExample",
+  storePath = "alluxio://localhost:19998/carbondata")
+exampleBody(carbon)
+carbon.close()
   }
 
-  def exampleBody(spark : SparkSession): Unit = {
+  def exampleBody (spark: SparkSession): Unit = {
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
 spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", 
"alluxio.hadoop.FileSystem")
--- End diff --

So you need to mention this in the current document


---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2462/



---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10500/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2244/



---

[GitHub] carbondata issue #3021: [CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support

2019-01-09 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/3021
  
@chandrasaripaka  please let us know 3026 if solved your issues?


---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2243/



---

[GitHub] carbondata issue #3060: [HOTFIX] Exclude filter doesn't work in presto carbo...

2019-01-09 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/3060
  
@ravipesala : please check. 


---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10499/



---

[GitHub] carbondata pull request #3060: [HOTFIX] Exclude filter doesn't work in prest...

2019-01-09 Thread ajantha-bhat

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/3060

[HOTFIX] Exclude filter doesn't work in presto carbon in cluster

**problem:** exclude filter fails in cluster for presto carbon with 
exception.
```
java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringBitmap
at 
org.apache.carbondata.core.scan.filter.FilterUtil.prepareExcludeFilterMembers(FilterUtil.java:826)
at 
org.apache.carbondata.core.scan.filter.FilterUtil.getDimColumnFilterInfoAfterApplyingCBO(FilterUtil.java:776)
at 
org.apache.carbondata.core.scan.filter.FilterUtil.getFilterListForAllValues(FilterUtil.java:884)

```
**cause:**  RoaringBitmap jar is not added in the dependency, hence it is 
not present in the presto snapshot folder.
**solution** : include RoaringBitmap in dependency.



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? NA
 
 - [ ] Any backward compatibility impacted? NA
 
 - [ ] Document update required? NA

 - [ ] Testing done. please find the report 
**Before:**
```
presto:default> select name from nbig where name < 'aj' limit 5;
Query 20190109_131447_4_qhrfk failed: org/roaringbitmap/RoaringBitmap

```
**After:**

```
presto:default> select name from nbig where name < 'aj' limit 5;
  name  

 208
 150209 
 150210 
 150211 
 150212 
(5 rows)
```

 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata issue_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3060


commit b3041adb284a0aa30a64e55908e6b9904c29
Author: ajantha-bhat 
Date:   2019-01-09T13:26:10Z

Fix Roaring bit map exception in presto filter query




---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2242/



---

[jira] [Resolved] (CARBONDATA-3237) optimize presto query time for dictionary include string column

2019-01-09 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-3237.
--
Resolution: Fixed

> optimize presto query time for dictionary include string column
> ---
>
> Key: CARBONDATA-3237
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3237
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> optimize presto query time for dictionary include string column.
>  
> problem: currently, for each query, presto carbon creates dictionary block 
> for string columns.
> This happens for each query and if cardinality is more , it takes more time 
> to build. This is not required. we can lookup using normal dictionary lookup.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread kevinjmh

Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246376652
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +609,14 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  if (blockInfos.size() > 0) {
--- End diff --

could be set to some value if use NODE_MIN_SIZE_FIRST but fall back to 
BLOCK_NUM_FIRST


---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2241/



---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10497/



---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10498/



---

[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3055
  
LGTM


---

[GitHub] carbondata pull request #3055: [CARBONDATA-3237] Fix presto carbon issues in...

2019-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/3055


---

[jira] [Resolved] (CARBONDATA-3200) No-Sort Compaction

2019-01-09 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-3200.
--
Resolution: Fixed

> No-Sort Compaction
> --
>
> Key: CARBONDATA-3200
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3200
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction upon, 
> the data still remains unsorted. This does not affect much in query. The 
> major purpose of compaction, is better pack the data and improve query 
> performance.
>  
> Now, the expected behaviour of compaction is sort to the data, so that after 
> compaction, query performance becomes better. The columns to sort upon are 
> provided by SORT_COLUMNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/3029


---

[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3029
  
LGTM


---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2460/



---

[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3029
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10496/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2461/



---

[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3029
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2240/



---

[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3055
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10495/



---

[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3046
  
@xuchuanyin In near future we are planing to change threshold(currently 
based on number) to size based local dictionary. Size based threshold will give 
more control.
Current changes in the PR is helping in doing that.
Later Just have to expose the table property in create table command for 
user to control the size threshold.

Also didn't get the meaning of your comment. these changes are minimal now 
also.


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread qiuchenjian

Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246352937
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +609,14 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  if (blockInfos.size() > 0) {
--- End diff --

```suggestion
  if (numOfNodes > 0) {
```
if  blockInfos.size() = 0 ï¼ sizePerNode will be 0ï¼ so no need to add if 
... else ... 
Do numOfNodes need to consider be 0?  


---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2239/



---

[jira] [Resolved] (CARBONDATA-3236) JVM Crash for insert into new table from old table

2019-01-09 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-3236.
--
Resolution: Fixed

> JVM Crash for insert into new table from old table
> --
>
> Key: CARBONDATA-3236
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3236
> Project: CarbonData
>  Issue Type: Bug
>Reporter: MANISH NALLA
>Assignee: MANISH NALLA
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #3056: [CARBONDATA-3236] Fix for JVM Crash for inser...

2019-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/3056


---

[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3055
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2238/



---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
LGTM


---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2459/



---

[GitHub] carbondata issue #3058: [WIP][CARBONDATA-3238] Solve StackOverflowError usin...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3058
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10494/



---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-09 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
LGTM


---

[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3055
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2458/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
LGTM


---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
@ravipesala reverted


---

[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3037
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10493/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
@QiangCai Please don't add binary files :( . you supposed to generate files 
and execute the test


---

[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3029
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2457/



---

[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...

2019-01-09 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3055
  
LGTM


---

[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3037
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2237/



---

[GitHub] carbondata pull request #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread NamanRastogi

Github user NamanRastogi commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3029#discussion_r246329602
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionUtil.java
 ---
@@ -400,24 +417,53 @@ private static int 
getDimensionDefaultCardinality(CarbonDimension dimension) {
* @param tableLastUpdatedTime
* @return
*/
-  public static boolean checkIfAnyRestructuredBlockExists(Map segmentMapping,
-  Map> dataFileMetadataSegMapping, long 
tableLastUpdatedTime) {
-boolean restructuredBlockExists = false;
-for (Map.Entry taskMap : 
segmentMapping.entrySet()) {
-  String segmentId = taskMap.getKey();
+  public static boolean checkIfAnyRestructuredBlockExists(
+  Map segmentMapping,
+  Map> dataFileMetadataSegMapping,
+  long tableLastUpdatedTime) {
+
+for (Map.Entry segmentEntry : 
segmentMapping.entrySet()) {
+  String segmentId = segmentEntry.getKey();
   List listMetadata = 
dataFileMetadataSegMapping.get(segmentId);
-  for (DataFileFooter dataFileFooter : listMetadata) {
-// if schema modified timestamp is greater than footer stored 
schema timestamp,
-// it indicates it is a restructured block
-if (tableLastUpdatedTime > 
dataFileFooter.getSchemaUpdatedTimeStamp()) {
-  restructuredBlockExists = true;
-  break;
-}
+
+  if (isRestructured(listMetadata, tableLastUpdatedTime)) {
+return true;
   }
-  if (restructuredBlockExists) {
-break;
+}
+
+return false;
+  }
+
+  public static boolean isRestructured(List listMetadata,
+  long tableLastUpdatedTime) {
+/*
+ * TODO: only in case of add and drop this variable should be true
+ */
+for (DataFileFooter dataFileFooter : listMetadata) {
+  // if schema modified timestamp is greater than footer stored schema 
timestamp,
+  // it indicates it is a restructured block
+  if (tableLastUpdatedTime > 
dataFileFooter.getSchemaUpdatedTimeStamp()) {
+return true;
   }
 }
-return restructuredBlockExists;
+return false;
   }
+
+  public static boolean isSorted(TaskBlockInfo taskBlockInfo) throws 
IOException {
+String filePath =
+
taskBlockInfo.getAllTableBlockInfoList().iterator().next().get(0).getFilePath();
+long fileSize =
+FileFactory.getCarbonFile(filePath, 
FileFactory.getFileType(filePath)).getSize();
+
+FileReader fileReader = 
FileFactory.getFileHolder(FileFactory.getFileType(filePath));
+ByteBuffer buffer =
+
fileReader.readByteBuffer(FileFactory.getUpdatedFilePath(filePath), fileSize - 
8, 8);
+fileReader.finish();
+
+CarbonFooterReaderV3 footerReader = new CarbonFooterReaderV3(filePath, 
buffer.getLong());
+FileFooter3 footer = footerReader.readFooterVersion3();
+
+return footer.isIs_sort();
--- End diff --

Done.


---

[GitHub] carbondata pull request #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread NamanRastogi

Github user NamanRastogi commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3029#discussion_r246329142
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonCompactionExecutor.java
 ---
@@ -105,10 +105,15 @@ public CarbonCompactionExecutor(Map segmentMapping,
*
* @return List of Carbon iterators
--- End diff --

Done


---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2453/



---

[GitHub] carbondata issue #3058: [WIP][CARBONDATA-3238] Solve StackOverflowError usin...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3058
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2456/



---

[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3029
  
LGTM


---

[GitHub] carbondata issue #3058: [WIP][CARBONDATA-3238] Solve StackOverflowError usin...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3058
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2236/



---

[GitHub] carbondata issue #3058: [WIP][CARBONDATA-3238] Solve StackOverflowError usin...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3058
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10490/



---

[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3037
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2455/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10491/



---

[GitHub] carbondata pull request #3055: [CARBONDATA-3237] Fix presto carbon issues in...

2019-01-09 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3055#discussion_r246319596
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
 ---
@@ -95,22 +105,14 @@ public SliceStreamReader(int batchSize, DataType 
dataType,
 dictOffsets[dictOffsets.length - 1] = size;
 dictionaryBlock = new 
VariableWidthBlock(dictionary.getDictionarySize(),
 Slices.wrappedBuffer(singleArrayDictValues), dictOffsets, 
Optional.of(nulls));
-values = (int[]) ((CarbonColumnVectorImpl) 
getDictionaryVector()).getDataArray();
+this.isLocalDict = true;
   }
-
   @Override public void setBatchSize(int batchSize) {
+
--- End diff --

done


---

[GitHub] carbondata pull request #3055: [CARBONDATA-3237] Fix presto carbon issues in...

2019-01-09 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3055#discussion_r246318548
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
 ---
@@ -142,5 +144,17 @@ public SliceStreamReader(int batchSize, DataType 
dataType,
 
   @Override public void reset() {
 builder = type.createBlockBuilder(null, batchSize);
+this.isLocalDict = false;
+  }
+
+  @Override public void putInt(int rowId, int value) {
+Object data = DataTypeUtil
--- End diff --

putInt() will not be called incase of local dictionary as setDictionary() 
itself is filling all the values array.  Hence no impact with change to local 
dictionary. 
Also local dictionary UT are present and running fine after the changes


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246317841
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +613,9 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

assignLeftOverBlocks also needs this similar if else self checks. I think 
its ok, you can   take a call to refactor now or later.


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen

Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246317595
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

this modify i think is ok , if using BLOCK_NUM_FIRST block assignment 
strategy


---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10492/



---

[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...

2019-01-09 Thread manishnalla1994

Github user manishnalla1994 commented on the issue:

https://github.com/apache/carbondata/pull/3056
  
@xuchuanyin Datasource table uses direct filling flow. As in direct flow 
there is no intermediate buffer so we are not using off-heap to store the page 
data in memory(filling all the records of a page to vector instead of filling 
batch wise). So in this case we can remove freeing of unsafe memory for Query 
as its not required.

In case of stored by table, handling will be different as we support both 
batch wise filling and direct filling and for batch filling we are using 
unsafe, so we have to clear unsafe memory in this case.
Here same handling is not required for data source table.
Please refer https://github.com/apache/carbondata/pull/2591 for stored by 
handling of this issue.




---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2454/



---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246311819
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

This if else can be complete avoided and use the correct variable in the 
method call for blocks allocation


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246311168
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +613,9 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

instead of reassigning the same variable, assignBlocksByDataLocality () can 
use numberOfBlocksPerNode directly?


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246309331
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

Please don't change sizePerNode variable


---

[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
> Does this PR fix two problems?
> If it is yes, better to separate it into two.
> 

the one line change of rowId to rowId + 1 is coupled with this, when i 
removed the compress method in unSafeFixLengthColumnPage, i got this issue and 
fixed in this, so this is required in this PR only


---

[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread manishgupta88

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
@kumarvishal09 ...I agree with you that it is a functional issue and we 
need to merge it. My point was before merging we can do one load performance 
test to see if there is any performance degrade and if there is any then we can 
update the benchmark results


---

[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3059
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2235/



---

[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...

2019-01-09 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/3037
  
retest this please


---

[GitHub] carbondata issue #3058: [WIP][CARBONDATA-3238] Solve StackOverflowError usin...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3058
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2233/



---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2234/



---

[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
@kumarvishal09 i have tested the fallback scenario by changing code, it is 
even failing with that also and i have raised discussion in snappy community 
also 
[https://groups.google.com/forum/#!topic/snappy-compression/4noNVKCMBqM](url)


---

[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...

2019-01-09 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3054
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10489/



---

[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
 @manishgupta88 @xuchuanyin I think if it's really a problem with snappy 
then whether any performance impact is there or not we have to merge as its a 
functional issue. :)
@akashrn5 May be this issue is coming because of offheap to onheap fallback 
in UnsafeMemoryManager can u please verify once. Please try discuss with snappy 
community also.



---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen

Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246299802
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -1164,4 +1156,35 @@ private static void deleteFiles(List 
filesToBeDeleted) throws IOExceptio
   FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath));
 }
   }
+
+  /**
+   * This method will calculate the average expected size for each node
+   *
+   * @param blockInfos blocks
+   * @param uniqueBlocks unique blocks
+   * @param numOfNodes if number of nodes has to be decided
+   *   based on block location information
+   * @param blockAssignmentStrategy strategy used to assign blocks
+   * @return the average expected size for each node
+   */
+  private static long calcAvgLoadSizePerNode(List 
blockInfos,
--- End diff --

ok, i modify it


---

[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread ndwangsen

Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246299700
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +597,10 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to recalculate
+  // the average expected size for each node
+  sizePerNode = calcAvgLoadSizePerNode(blockInfos,uniqueBlocks,
--- End diff --

ok, i modify it.


---

[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...

2019-01-09 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/3001
  
@ravipesala 
added test case for reading stream table 


---

[jira] [Resolved] (CARBONDATA-3235) AlterTableRename and PreAgg Datamap Fail Issue

2019-01-09 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-3235.
--
Resolution: Fixed

> AlterTableRename and PreAgg Datamap Fail Issue
> --
>
> Key: CARBONDATA-3235
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3235
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Alter Table Rename Table Fail
>  * When table rename is success in hive, but failed in carbon data store, it 
> would throw exception, but would not go back and undo rename in hive.
> h3. Create-Preagregate-Datamap Fail
>  * When (preaggregate) datamap schema is written, but table updation is failed
> -> call CarbonDropDataMapCommand.processMetadata()
> -> call dropDataMapFromSystemFolder() -> this is supposed to delete the 
> folder on disk, but doesnt as the datamap is not yet updated in table, and 
> throws NoSuchDataMapException



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-c...

2019-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2996


---

[GitHub] carbondata issue #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation...

2019-01-09 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2996
  
LGTM


---

[GitHub] carbondata pull request #3032: [CARBONDATA-3210] Merge common method into Ca...

2019-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/3032


---

1 2 >

1 - 100 of 110 matches

Mail list logo