[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11437 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190904837 Merging this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190899484 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190899490 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52251/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190898706 **[Test build #52251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52251/consoleFull)** for PR 11437 at commit [`e539d8a`](https://github.com/apache/spark/commit/e539d8a94735668c370459ca8bf5a937ee22321d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190855205 **[Test build #52251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52251/consoleFull)** for PR 11437 at commit [`e539d8a`](https://github.com/apache/spark/commit/e539d8a94735668c370459ca8bf5a937ee22321d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190846489 Cool. Lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190845510 @nongli There is no visible difference on all existing benchmarks (ColumnarBatch and ParquetRead), they don't use dictionary encoding. After changed the intStringScan to use dictionary encoding (small number unique values), here is the result: Before this patch ``` Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz Int and String Scan:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- SQL Parquet Reader 1248 / 1281 8.4 119.0 1.0X SQL Parquet MR 1962 / 2093 5.3 187.1 0.6X SQL Parquet Vectorized876 / 1018 12.0 83.5 1.4X ParquetReader 741 / 755 14.1 70.7 1.7X ``` After the patch ``` Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz Int and String Scan:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- SQL Parquet Reader 1247 / 1279 8.4 118.9 1.0X SQL Parquet MR 1809 / 1851 5.8 172.5 0.7X SQL Parquet Vectorized805 / 909 13.0 76.8 1.5X ParquetReader 742 / 756 14.1 70.7 1.7X ``` We can see 10% improvement on SQL Parquet Vectorized, but no difference on ParquetReader, I don't know why. (I didn't included #11274 ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190812391 Can you run the ColumnarBatch/ParquetRead benchmark? Does this have perf problems if there is no dictionary or there is no filter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11437#discussion_r54597312 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java --- @@ -620,13 +624,6 @@ private void readBatch(int total, ColumnVector column) throws IOException { } int num = Math.min(total, leftInPage); if (useDictionary) { - // Data is dictionary encoded. We will vector decode the ids and then resolve the values. - if (dictionaryIds == null) { --- End diff -- Remove dictionaryIds from this class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11437#discussion_r54597392 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java --- @@ -695,28 +684,28 @@ private void decodeDictionaryIds(int rowId, int num, ColumnVector column) { case INT64: if (column.dataType() == DataTypes.LongType || DecimalType.is64BitDecimalType(column.dataType())) { -for (int i = rowId; i < rowId + num; ++i) { - column.putLong(i, dictionary.decodeToLong(dictionaryIds.getInt(i))); -} +column.setDictionary(dictionary); } else { throw new NotImplementedException("Unimplemented type: " + column.dataType()); } break; case FLOAT: - for (int i = rowId; i < rowId + num; ++i) { -column.putFloat(i, dictionary.decodeToFloat(dictionaryIds.getInt(i))); - } + column.setDictionary(dictionary); break; case DOUBLE: - for (int i = rowId; i < rowId + num; ++i) { -column.putDouble(i, dictionary.decodeToDouble(dictionaryIds.getInt(i))); - } + column.setDictionary(dictionary); break; case FIXED_LEN_BYTE_ARRAY: - if (DecimalType.is64BitDecimalType(column.dataType())) { + // DecimalType written in the legacy mode + if (DecimalType.is32BitDecimalType(column.dataType())) { +for (int i = rowId; i < rowId + num; ++i) { + Binary v = dictionary.decodeToBinary(dictionaryIds.getInt(i)); + column.putInt(i,(int) CatalystRowConverter.binaryToUnscaledLong(v)); --- End diff -- missing space after , --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190603255 **[Test build #2593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)** for PR 11437 at commit [`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190563231 **[Test build #2593 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2593/consoleFull)** for PR 11437 at commit [`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190507434 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52207/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190507431 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190507323 **[Test build #52207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52207/consoleFull)** for PR 11437 at commit [`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190479973 **[Test build #52207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52207/consoleFull)** for PR 11437 at commit [`6fce801`](https://github.com/apache/spark/commit/6fce80141c76604167914a8cbb39847f1a4f457a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190477588 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52206/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190477587 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190477582 **[Test build #52206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52206/consoleFull)** for PR 11437 at commit [`5faa786`](https://github.com/apache/spark/commit/5faa786628f4b3d61774973f4351693015ba017c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190477384 **[Test build #52206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52206/consoleFull)** for PR 11437 at commit [`5faa786`](https://github.com/apache/spark/commit/5faa786628f4b3d61774973f4351693015ba017c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190473494 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190473496 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52205/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190473490 **[Test build #52205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52205/consoleFull)** for PR 11437 at commit [`081e6fe`](https://github.com/apache/spark/commit/081e6fe81e2280e4b8041bf376066b9b1d82cc57). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190473204 **[Test build #52205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52205/consoleFull)** for PR 11437 at commit [`081e6fe`](https://github.com/apache/spark/commit/081e6fe81e2280e4b8041bf376066b9b1d82cc57). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190446124 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190446110 **[Test build #52202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52202/consoleFull)** for PR 11437 at commit [`6676e74`](https://github.com/apache/spark/commit/6676e746b887730eadf9cca297ede4cff7a0de2f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190446127 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52202/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190445236 **[Test build #52202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52202/consoleFull)** for PR 11437 at commit [`6676e74`](https://github.com/apache/spark/commit/6676e746b887730eadf9cca297ede4cff7a0de2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11437#issuecomment-190444065 cc @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13582] [SQL] defer dictionary decoding ...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/11437 [SPARK-13582] [SQL] defer dictionary decoding in parquet reader ## What changes were proposed in this pull request? This PR defer the resolution from a id of dictionary to value until the column is actually accessed (inside getInt/getLong), this is very useful for those columns and rows that are filtered out. It's also useful for binary type, we will not need to copy all the byte arrays. ## How was this patch tested? Manually test TPCDS Q7 with scale factor 10, saw about 30% improvements (after PR #11274). You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark decode_dict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11437 commit 6676e746b887730eadf9cca297ede4cff7a0de2f Author: Davies LiuDate: 2016-02-29T23:08:52Z defer dictionary decoding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org