[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133377209
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
 ---
@@ -0,0 +1,103 @@
+package org.apache.carbondata.presto.readers;
--- End diff --

license header is missing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1253: [CARBONDATA-1373] Enhance update performance by incr...

2017-08-16 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/1253
  
A new PR #1261 has been raised to solve this problem in a better way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133376989
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/readers/BooleanStreamReader.java
 ---
@@ -0,0 +1,31 @@
+package org.apache.carbondata.presto.readers;
--- End diff --

The license header is missing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133371582
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/expression/ExpressionResult.java
 ---
@@ -183,7 +183,7 @@ public String getString() throws 
FilterIllegalMemberException {
 return parser.format((java.sql.Date) value);
   } else if (value instanceof Long) {
 if (isLiteral) {
-  return parser.format(new Timestamp((long) value / 1000));
+  return parser.format(new Timestamp((long) value));
--- End diff --

@chenliang613 earlier time stamp was getting multiplied by 1000 in 
carbondata so thats why it was wriiten as 
-return parser.format(new Timestamp((long) value / 1000));
but now it is not getting multipled by 1000 so at time of building 
expression result so now there is no need of dividing timestamp by 1000



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
@steven-qin this pr has resolved this issue with more optimizations #1257 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1259: [Review][CARBONDATA-1381] Add test cases for missing...

2017-08-16 Thread sraghunandan
Github user sraghunandan commented on the issue:

https://github.com/apache/carbondata/pull/1259
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1253: [CARBONDATA-1373] Enhance update performance ...

2017-08-16 Thread xuchuanyin
Github user xuchuanyin closed the pull request at:

https://github.com/apache/carbondata/pull/1253


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133370557
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/expression/ExpressionResult.java
 ---
@@ -183,7 +183,7 @@ public String getString() throws 
FilterIllegalMemberException {
 return parser.format((java.sql.Date) value);
   } else if (value instanceof Long) {
 if (isLiteral) {
-  return parser.format(new Timestamp((long) value / 1000));
+  return parser.format(new Timestamp((long) value));
--- End diff --

Why need do this change (remove /1000) for getString() ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133372855
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/constants/CarbonCommonConstants.java
 ---
@@ -0,0 +1,1319 @@
+/*
--- End diff --

please only keep these CommonConstants which be used by presto module, 
don't need to copy all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1261: [CARBONDATA-1373] Enhance update performance by incr...

2017-08-16 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/1261
  
Previous PR #1253 is closed, use this new PR  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1261: [CARBONDATA-1373] Enhance update performance ...

2017-08-16 Thread xuchuanyin
GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/1261

[CARBONDATA-1373] Enhance update performance by increasing parallelism

# Scenario

Recently I have tested the update feature provided in Carbondata and found 
its poor performance.

I had a table containing about 14 million records with about 370 columns(no 
dictionary columns) and the data files are about 3.8 GB in total. All the data 
files were in one segment.

I performed an update SQL which update a column for all the records and the 
SQL looked like `UPDATE myTable SET (col1)=(col1+1000) WHERE TRUE`. In my 
environment, the update job failed with 'executor lost errors'. And I found 
'spill data' related messages in the container logs.

# Analyze

I've read about the implementation of update-delete in Carbondata in 
ISSUE#440. The update consists a delete and an insert operation. And the error 
occurred during the insert operation.

After studying the code, I have found that while doing inserting, the 
updated records are grouped by the `segmentId`, which means all the recoreds in 
one segment will be processed in only one task, thus will cause task failure 
when the amount of input data is quite large.

# Solution

We should improve the parallelism when doing update for a segment.

I append a random key to the `segmentId` to increase the partition number 
before doing the insertion stage and then remove the suffix when doing the real 
insertion.

# Modification

+ Increase parallelism while processing one segment in update, this is 
achieved by distributing records to different partitions by using a customized 
partitioner.
+ Add a property to configure the parallelism.
+ Clean up local files after update (previous bugs)
+ Remove useless imports
+ Add tests
+ Add related documents

# Notes

I have tested in my example and the job finished in about 13 minutes 
successfully. The records were updated as expected.

Comparing to the previous implementation, the update performance has been 
enhanced:

Origin(Parallelism(1) + GroupBy): Update **FAILED**

Adding Parallelism(6) + GroupBy: Update **SUCCESSFULLY** using **13mins**

Parallelism(1) +  PartitionBy: Update **SUCCESSFULLY** using **21mins**

Adding Parallelism(6) + PartitionBy: Update **SUCCESSFULLY** using **5mins**


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata enhance_update_perf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1261


commit ebfe1ca2a125c0b736e917d8a7956f5e39dedc50
Author: xuchuanyin 
Date:   2017-08-11T15:00:20Z

Enhance update performance by increasing parallelism

+ Increase parallelism while processing one segment in update
+ Use partitionBy instead of groupby
+ Add a property to configure the parallelism
+ Clean up local files after update (previous bugs)
+ Remove useless imports




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1254: [CARBONDATA-1379] Fixed Date range filter wit...

2017-08-16 Thread lionelcao
Github user lionelcao commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1254#discussion_r133364318
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/DateDirectDictionaryGenerator.java
 ---
@@ -154,14 +147,14 @@ private int 
generateDirectSurrogateKeyForNonTimestampType(String memberStr) {
   }
 
   private int generateKey(long timeValue) {
-long milli = timeValue + 
threadLocalLocalTimeZone.get().getOffset(timeValue);
-return (int) Math.floor((double) milli / MILLIS_PER_DAY) + cutOffDate;
+return (int) Math.floor((double) timeValue / MILLIS_PER_DAY) + 
cutOffDate;
   }
 
   public void initialize() {
 if (simpleDateFormatLocal.get() == null) {
   simpleDateFormatLocal.set(new SimpleDateFormat(dateFormat));
   simpleDateFormatLocal.get().setLenient(false);
+  simpleDateFormatLocal.get().setTimeZone(TimeZone.getTimeZone("GMT"));
--- End diff --

Does it mean carbon will process all Date/TimeStamp as GMT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1257: [CARBONDATA-1347] Implemented Columnar Reading Of Da...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1257
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133375796
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/memory/AggregatedMemoryContext.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
--- End diff --

The license header is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1235: fixed bug for IsNull and IsNotNull clause in presto

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1235
  
@steven-qin can you please provide reproducable steps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133375914
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/memory/LocalMemoryContext.java
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
--- End diff --

The license header is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133375740
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/memory/AbstractAggregatedMemoryContext.java
 ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
--- End diff --

The license header is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133377510
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
 ---
@@ -0,0 +1,103 @@
+package org.apache.carbondata.presto.readers;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import com.facebook.presto.spi.block.Block;
+import com.facebook.presto.spi.block.BlockBuilder;
+import com.facebook.presto.spi.block.BlockBuilderStatus;
+import com.facebook.presto.spi.type.DecimalType;
+import com.facebook.presto.spi.type.Decimals;
+import com.facebook.presto.spi.type.Type;
+import io.airlift.slice.Slice;
+
+import static com.facebook.presto.spi.type.Decimals.encodeUnscaledValue;
+import static com.facebook.presto.spi.type.Decimals.isShortDecimal;
+import static com.facebook.presto.spi.type.Decimals.rescale;
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkState;
+import static io.airlift.slice.Slices.utf8Slice;
+import static java.math.RoundingMode.HALF_UP;
+
+public class DecimalSliceStreamReader implements StreamReader {
--- End diff --

This pr is optimizing for column reader, why need to add StreamReader ?  
please consider using different PR to implement different features.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1202: [CARBONDATA-1326] Fixed normal/low priority f...

2017-08-16 Thread mohammadshahidkhan
Github user mohammadshahidkhan closed the pull request at:

https://github.com/apache/carbondata/pull/1202


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...

2017-08-16 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on the issue:

https://github.com/apache/carbondata/pull/1202
  
not required


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1204: [CARBONDATA-1326] Fixed normal/low priority findbug ...

2017-08-16 Thread sraghunandan
Github user sraghunandan commented on the issue:

https://github.com/apache/carbondata/pull/1204
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1257#discussion_r133407524
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/constants/CarbonCommonConstants.java
 ---
@@ -0,0 +1,1319 @@
+/*
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1126: [CARBONDATA-1258] CarbonData should not allow...

2017-08-16 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1126#discussion_r133404888
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/DateDirectDictionaryGenerator.java
 ---
@@ -154,6 +162,12 @@ private int 
generateDirectSurrogateKeyForNonTimestampType(String memberStr) {
   }
 
   private int generateKey(long timeValue) {
+if (timeValue < MIN_VALUE || timeValue > MAX_VALUE) {
+  if (LOGGER.isDebugEnabled()) {
+LOGGER.debug("Value for date type column is not in valid range. 
Value considered as null.");
+  }
+  return 1;
--- End diff --

if the value is not in defined range then it will be store as null, if bad 
records action is FORCE.
For null value dictionary key is 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1234: fix bug in Spi2CarbondataTypeMapper method, it will ...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1234
  
@linqer how to test this pr can you provide reproducable steps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1257: [CARBONDATA-1347] Implemented Columnar Reading Of Da...

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1257
  
SDV Build Failed with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/197/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1204: [CARBONDATA-1326] Fixed normal/low priority findbug ...

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1204
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/198/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
@anubhav100 i prefer to using steven-qin's pull request, because steven-qin 
fix it earlier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1258: [CARBONDATA-1325] Add partition guidance doc

2017-08-16 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1258
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1260: [CARBONDATA-1382] Add more test cases for bucket fea...

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1260
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1252: [CARBONDATA-1372]Update the documentation

2017-08-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (CARBONDATA-1372) Fix some errors and update the examples in documentation

2017-08-16 Thread Liang Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen resolved CARBONDATA-1372.

   Resolution: Fixed
Fix Version/s: 1.2.0

> Fix some errors and update the examples in documentation
> 
>
> Key: CARBONDATA-1372
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1372
> Project: CarbonData
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 1.2.0
>Reporter: xubo245
>Assignee: xubo245
>Priority: Minor
> Fix For: 1.2.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are some errors in CarbonData docs, these should be fixed.
> Some examples in documentation should be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata issue #1260: [CARBONDATA-1382] Add more test cases for bucket fea...

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1260
  
SDV Build Failed with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/201/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
@chenliang sure he has done a great job


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1258: [CARBONDATA-1325] Add partition guidance doc

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1258
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/200/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/202/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1240: [CARBONDATA-1365] add RLE codec implementatio...

2017-08-16 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1240#discussion_r133606402
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/RLECodec.java
 ---
@@ -0,0 +1,417 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.encoding;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.datastore.page.ColumnPage;
+import org.apache.carbondata.core.datastore.page.ComplexColumnPage;
+import 
org.apache.carbondata.core.datastore.page.statistics.SimpleStatsResult;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.metadata.CodecMetaFactory;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+
+/**
+ * RLE encoding implementation for integral column page.
+ * This encoding keeps track of repeated-run and non-repeated-run, and 
make use
+ * of the highest bit of the length field to indicate the type of run.
+ * The length field is encoded as 32bits value.
+ *
+ * For example: input data {5, 5, 1, 2, 3, 3, 3, 3, 3} will be encoded to
+ * {0x00, 0x00, 0x00, 0x02, 0x05, (repeated-run, 2 values of 5)
+ *  0x80, 0x00, 0x00, 0x03, 0x01, 0x02, 0x03, (non-repeated-run, 3 values: 
1, 2, 3)
+ *  0x00, 0x00, 0x00, 0x04, 0x03} (repeated-run, 4 values of 3)
+ */
+public class RLECodec implements ColumnPageCodec {
+
+  enum RUN_STATE { INIT, START, REPEATED_RUN, NONREPEATED_RUN }
+
+  private DataType dataType;
+  private int pageSize;
+
+  /**
+   * New RLECodec
+   * @param dataType data type of the raw column page before encode
+   * @param pageSize page size of the raw column page before encode
+   */
+  RLECodec(DataType dataType, int pageSize) {
+this.dataType = dataType;
+this.pageSize = pageSize;
+  }
+
+  @Override
+  public String getName() {
+return "RLECodec";
+  }
+
+  @Override
+  public EncodedColumnPage encode(ColumnPage input) throws 
MemoryException, IOException {
+Encoder encoder = new Encoder();
+return encoder.encode(input);
+  }
+
+  @Override
+  public EncodedColumnPage[] encodeComplexColumn(ComplexColumnPage input) {
+throw new UnsupportedOperationException("complex column does not 
support RLE encoding");
+  }
+
+  @Override
+  public ColumnPage decode(byte[] input, int offset, int length) throws 
MemoryException,
+  IOException {
+Decoder decoder = new Decoder(dataType, pageSize);
+return decoder.decode(input, offset, length);
+  }
+
+  // This codec supports integral type only
+  private void validateDataType(DataType dataType) {
+switch (dataType) {
+  case BYTE:
+  case SHORT:
+  case INT:
+  case LONG:
--- End diff --

I think we better make it for integral value only. For double and decimal, 
their distinct value are more in most of the case.
Anyway, we can add it in future if it is required in future PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1258: [CARBONDATA-1325] Add partition guidance doc

2017-08-16 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1258#discussion_r133619148
  
--- Diff: docs/partition-guide.md ---
@@ -0,0 +1,124 @@
+
+
+### CarbonData Partition Table Guidance
+This guidance illustrates how to create & use partition table in 
CarbonData.
+
+* [Create Partition Table](#create-partition-table)
+  - [Create Hash Partition Table](#create-hash-partition-table)
+  - [Create Range Partition Table](#create-range-partition-table)
+  - [Create List Partition Table](#create-list-partition-table)
+* [Show Partitions](#show-partitions)
+* [Maintain the Partitions](#maintain-the-partitions)
+* [Partition Id](#partition-id)
+* [Tips](#tips)
+
+### Create Partition Table
+
+# Create Hash Partition Table
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+[(col_name data_type , ...)]
+   STORED BY 'carbondata'
+   PARTITIONED BY (partition_col_name data_type)
--- End diff --

The ‘PARTITIONED BY' must be before 'STORED BY', otherwise it will throw 
an error:
`mismatched input 'PARTITIONED' expecting {, '(', 'SELECT', 'FROM', 
'AS', 'WITH', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE', 'TBLPROPERTIES', 
'LOCATION'}(line 11, pos 2)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1249: [WIP] Use ColumnPage in reader for measure co...

2017-08-16 Thread jackylk
Github user jackylk closed the pull request at:

https://github.com/apache/carbondata/pull/1249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1231
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/206/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1231
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/203/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1240: [CARBONDATA-1365] add RLE codec implementation

2017-08-16 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1240
  
SDV Build Success with Spark 2.1, Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/204/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1262: [BUGFIX] Fix ZERO_BYTE_ARRAY constant not fou...

2017-08-16 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/1262

[BUGFIX] Fix ZERO_BYTE_ARRAY constant not found in codegen

CarbonCommonConstant.ZERO_BYTE_ARRAY is used in codegen, it should not be 
deleted. This PR add it back

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata zero

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1262


commit 53c9f24edf3f9cad1c2149d7b48197ed00e5522e
Author: Jacky Li 
Date:   2017-08-17T03:45:24Z

fix code gen




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1258: [CARBONDATA-1325] Add partition guidance doc

2017-08-16 Thread zzcclp
Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1258#discussion_r133619308
  
--- Diff: docs/partition-guide.md ---
@@ -0,0 +1,124 @@
+
+
+### CarbonData Partition Table Guidance
+This guidance illustrates how to create & use partition table in 
CarbonData.
+
+* [Create Partition Table](#create-partition-table)
+  - [Create Hash Partition Table](#create-hash-partition-table)
+  - [Create Range Partition Table](#create-range-partition-table)
+  - [Create List Partition Table](#create-list-partition-table)
+* [Show Partitions](#show-partitions)
+* [Maintain the Partitions](#maintain-the-partitions)
+* [Partition Id](#partition-id)
+* [Tips](#tips)
+
+### Create Partition Table
+
+# Create Hash Partition Table
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+[(col_name data_type , ...)]
+   STORED BY 'carbondata'
+   PARTITIONED BY (partition_col_name data_type)
+   [TBLPROPERTIES ('PARTITION_TYPE'='HASH', 
+   'PARTITION_NUM'='N' ...)]  
--- End diff --

change  'PARTITION_NUM' to 'NUM_PARTITIONS'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-1385) Add Test cases for Hive Integration

2017-08-16 Thread anubhav tarar (JIRA)
anubhav tarar created CARBONDATA-1385:
-

 Summary: Add Test cases for Hive Integration
 Key: CARBONDATA-1385
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1385
 Project: CarbonData
  Issue Type: Test
Reporter: anubhav tarar
Assignee: anubhav tarar
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive

2017-08-16 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1231
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1255: [CARBONDATA-1375] clean hive pom

2017-08-16 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/carbondata/pull/1255
  
1.use hadoop.version instead of 2.6.0
2.use hive.version instead of 1.2.1
3.remove thrift
4.remove zookeeper
5.remove spark-hive and spark-sql


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1234: fix bug in Spi2CarbondataTypeMapper method, it will ...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1234
  
@linqer please correct your PR title as per : 
https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...

2017-08-16 Thread anubhav100
Github user anubhav100 commented on the issue:

https://github.com/apache/carbondata/pull/1236
  
@steven-qin
1.please correct your PR title as per : 
https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md
2.there is still problem with decimal type i tried to run this query in 
both carbon data and presto from your branch and presto  is giving wrong results

in carbon-

carbon.sql("select l_tax from lineitem where l_tax=0.06").show()
carbon.stop()

+-+
|l_tax|
+-+
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
| 0.06|
+-+
only showing top 20 rows

when i execute it in presto 

select l_tax from lineitem where l_tax=0.06;

i get no result 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...

2017-08-16 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/1192
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---