Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-49138529
@yhuai @concretevitamin @rxin I've create another PR for this follow up, we
can discuss this more at:
https://github.com/apache/spark/pull/1439
---
If your
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48930414
Thanks for reviewing this everyone. I'm all for commenting and cleaning
things up here, but if possible I'd like to merge this in today. There are a
couple of people
Github user concretevitamin closed the pull request at:
https://github.com/apache/spark/pull/1390
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user concretevitamin commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48935494
@yhuai suggested a much simpler fix -- I benchmarked this and it gave the
same performance boost. I am closing this and opening a new PR.
---
If your project is
GitHub user concretevitamin opened a pull request:
https://github.com/apache/spark/pull/1408
[SPARK-2443][SQL] Fix slow read from partitioned tables
This fix obtains a comparable performance boost as [PR
#1390](https://github.com/apache/spark/pull/1390) by moving an array update
Github user concretevitamin commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48936743
New PR here: #1408
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user concretevitamin commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48936856
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48937213
QA tests have started for PR 1408. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16631/consoleFull
---
Github user concretevitamin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14894946
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14898638
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48951901
QA results for PR 1408:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48954454
Thanks! I've merged this into both master and 1.0.
Are there other followup thing we want to fix from the discussion on the
other PR? or should I consider
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1408
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user concretevitamin commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48954674
I think we should ask the users who reported the performance issue if this
fix solves their problems. Otherwise the comments in the previous PR seem to
only
Github user concretevitamin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14902569
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient
Github user concretevitamin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14902570
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48978522
This will works in most of cases I think. But it may raise exceptions if
the Table's Deserializer differs from the partition's Deserializer, since they
may have
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1408#issuecomment-48979538
@chenghao-intel Can you ping me after you create the PR or the JIRA?
Thanks:)
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48832990
@yhuai can you take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14856885
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48845188
I am reviewing it. Will comment it later today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48859675
The code looks good to me. However, I think we can avoid the work around
solution (de-serializing (with partition serde) and then serialize (with table
serde)
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48859842
And as the Hive SerDe actually provides the feature of `lazy` parsing,
hence during the converting of `raw object` to `Row`, we need to support the
column pruning
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48860018
@chenghao-intel I am not sure I understand your comment on column pruning.
I think for a Hive table, we should use `ColumnProjectionUtils` to set needed
columns. So,
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14862289
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14862300
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14862338
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/1390#discussion_r14862941
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:
Github user concretevitamin commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48830080
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48830138
QA tests have started for PR 1390. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16599/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1390#issuecomment-48831466
QA results for PR 1390:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
31 matches
Mail list logo