[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11437


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190904837
  
Merging this into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190899484
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190899490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52251/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190898706
  
**[Test build #52251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52251/consoleFull)**
 for PR 11437 at commit 
[`e539d8a`](https://github.com/apache/spark/commit/e539d8a94735668c370459ca8bf5a937ee22321d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190855205
  
**[Test build #52251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52251/consoleFull)**
 for PR 11437 at commit 
[`e539d8a`](https://github.com/apache/spark/commit/e539d8a94735668c370459ca8bf5a937ee22321d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190846489
  
Cool. Lgtm 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190845510
  
@nongli There is no visible difference on all existing benchmarks 
(ColumnarBatch and ParquetRead), they don't use dictionary encoding.

After changed the intStringScan to use dictionary encoding (small number 
unique values), here is the result:

Before this patch 

```
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

---
SQL Parquet Reader   1248 / 1281  8.4 
119.0   1.0X
SQL Parquet MR   1962 / 2093  5.3 
187.1   0.6X
SQL Parquet Vectorized876 / 1018 12.0  
83.5   1.4X
ParquetReader 741 /  755 14.1  
70.7   1.7X
```

After the patch 
```
Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
Int and String Scan:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

---
SQL Parquet Reader   1247 / 1279  8.4 
118.9   1.0X
SQL Parquet MR   1809 / 1851  5.8 
172.5   0.7X
SQL Parquet Vectorized805 /  909 13.0  
76.8   1.5X
ParquetReader 742 /  756 14.1  
70.7   1.7X
```

We can see 10% improvement on SQL Parquet Vectorized, but no difference on 
ParquetReader, I don't know why. (I didn't included #11274 )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190812391
  
Can you run the ColumnarBatch/ParquetRead benchmark? Does this have perf 
problems if there is no dictionary or there is no filter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11437#discussion_r54597312
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java
 ---
@@ -620,13 +624,6 @@ private void readBatch(int total, ColumnVector column) 
throws IOException {
 }
 int num = Math.min(total, leftInPage);
 if (useDictionary) {
-  // Data is dictionary encoded. We will vector decode the ids and 
then resolve the values.
-  if (dictionaryIds == null) {
--- End diff --

Remove dictionaryIds from this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11437#discussion_r54597392
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java
 ---
@@ -695,28 +684,28 @@ private void decodeDictionaryIds(int rowId, int num, 
ColumnVector column) {
 case INT64:
   if (column.dataType() == DataTypes.LongType ||
   DecimalType.is64BitDecimalType(column.dataType())) {
-for (int i = rowId; i < rowId + num; ++i) {
-  column.putLong(i, 
dictionary.decodeToLong(dictionaryIds.getInt(i)));
-}
+column.setDictionary(dictionary);
   } else {
 throw new NotImplementedException("Unimplemented type: " + 
column.dataType());
   }
   break;
 
 case FLOAT:
-  for (int i = rowId; i < rowId + num; ++i) {
-column.putFloat(i, 
dictionary.decodeToFloat(dictionaryIds.getInt(i)));
-  }
+  column.setDictionary(dictionary);
   break;
 
 case DOUBLE:
-  for (int i = rowId; i < rowId + num; ++i) {
-column.putDouble(i, 
dictionary.decodeToDouble(dictionaryIds.getInt(i)));
-  }
+  column.setDictionary(dictionary);
   break;
 
 case FIXED_LEN_BYTE_ARRAY:
-  if (DecimalType.is64BitDecimalType(column.dataType())) {
+  // DecimalType written in the legacy mode
+  if (DecimalType.is32BitDecimalType(column.dataType())) {
+for (int i = rowId; i < rowId + num; ++i) {
+  Binary v = 
dictionary.decodeToBinary(dictionaryIds.getInt(i));
+  column.putInt(i,(int) 
CatalystRowConverter.binaryToUnscaledLong(v));
--- End diff --

missing space after ,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190603255
  
**[Test build #2593 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)**
 for PR 11437 at commit 
[`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190563231
  
**[Test build #2593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)**
 for PR 11437 at commit 
[`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190507434
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52207/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190507431
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190507323
  
**[Test build #52207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52207/consoleFull)**
 for PR 11437 at commit 
[`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190479973
  
**[Test build #52207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52207/consoleFull)**
 for PR 11437 at commit 
[`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190477588
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52206/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190477587
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190477582
  
**[Test build #52206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52206/consoleFull)**
 for PR 11437 at commit 
[`5faa786`](https://github.com/apache/spark/commit/5faa786628f4b3d61774973f4351693015ba017c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190477384
  
**[Test build #52206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52206/consoleFull)**
 for PR 11437 at commit 
[`5faa786`](https://github.com/apache/spark/commit/5faa786628f4b3d61774973f4351693015ba017c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190473494
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190473496
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52205/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190473490
  
**[Test build #52205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52205/consoleFull)**
 for PR 11437 at commit 
[`081e6fe`](https://github.com/apache/spark/commit/081e6fe81e2280e4b8041bf376066b9b1d82cc57).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190473204
  
**[Test build #52205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52205/consoleFull)**
 for PR 11437 at commit 
[`081e6fe`](https://github.com/apache/spark/commit/081e6fe81e2280e4b8041bf376066b9b1d82cc57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190446124
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190446110
  
**[Test build #52202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52202/consoleFull)**
 for PR 11437 at commit 
[`6676e74`](https://github.com/apache/spark/commit/6676e746b887730eadf9cca297ede4cff7a0de2f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190446127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52202/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190445236
  
**[Test build #52202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52202/consoleFull)**
 for PR 11437 at commit 
[`6676e74`](https://github.com/apache/spark/commit/6676e746b887730eadf9cca297ede4cff7a0de2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11437#issuecomment-190444065
  
cc @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...

2016-02-29 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/11437

[SPARK-13582] [SQL] defer dictionary decoding in parquet reader

## What changes were proposed in this pull request?

This PR defer the resolution from a id of dictionary to value until the 
column is actually accessed (inside getInt/getLong), this is very useful for 
those columns and rows that are filtered out. It's also useful for binary type, 
we will not need to copy all the byte arrays.

## How was this patch tested?

Manually test TPCDS Q7 with scale factor 10, saw about 30% improvements 
(after PR #11274). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark decode_dict

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11437


commit 6676e746b887730eadf9cca297ede4cff7a0de2f
Author: Davies Liu 
Date:   2016-02-29T23:08:52Z

defer dictionary decoding




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org