Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15438771
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
---
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15435898
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -241,4 +251,37 @@ private[hive] object HadoopTableReader {
val
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-50248847
Also, can you delete `[WIP]` from the PR title?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49883931
@yhuai @concretevitamin @marmbrus @liancheng Can you take a look of this?
I think the test result gave us more confidence for the improvement.
---
If your
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49574159
Thank you guys, I've updated the code as suggested, and the also attached
the micro-benchmark result in the PR description.
---
If your project is set up for it,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49577438
QA results for PR 1439:br- This patch FAILED unit tests.brbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49578517
QA results for PR 1439:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49573309
QA tests have started for PR 1439. This patch DID NOT merge cleanly!
brView progress:
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49573615
QA tests have started for PR 1439. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16897/consoleFull
---
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49501954
As for benchmarks, the micro benchmark code comes with #758 may be helpful.
And I feel that partitioning support for Parquet should be considered together
with the
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49483743
I'll just add the the `HiveTableReader` vs `HiveTableScan` separation is
purely artificial, and the split is based on what code was stolen from Shark vs
what code was
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49265512
QA results for PR 1439:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15067812
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala
---
@@ -67,95 +61,12 @@ case class HiveTableScan(
}
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15068484
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -156,33 +158,43 @@ class HadoopTableReader(@transient _tableDesc:
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49338675
I think we are not clear on the boundary between a `TableReader` and a
physical `TableScan` operator (e.g. `HiveTableScan`). Seems we just want
`TableReader` to create
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49341477
@chenghao-intel explained the root cause in
https://issues.apache.org/jira/browse/SPARK-2523. Basically, we should use
partition-specific `ObjectInspectors` to extract
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15092652
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala
---
@@ -67,95 +61,12 @@ case class HiveTableScan(
}
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15092708
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -156,33 +158,43 @@ class HadoopTableReader(@transient _tableDesc:
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49386103
@yhuai I agree with you we should make a clear boundary between
`HiveTableScan` and `TableReader`, but I am not sure if it's a good idea to
create multiple
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49386783
@chenghao-intel I did not meant to introduce multiple `HiveTableScan`. I
meant to have a abstract `TableScan` and make existing ones (e.g.
`HiveTableScan` and
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49387608
@yhuai sorry if I misundertood. Do you mean the `HiveTableScan`
`ParquetTableScan` is the new operators, which created by SparkPlanner, right?
---
If your
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49388234
@yhuai I got your mean eventually, I think you're right, some of the logic
could be shared among TableScan operators.
---
If your project is set up for it, you
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49138992
`ObjectInspector` is not required by `Row` in Catalyst any more (not like
in Shark), and it is tightly coupled with Deserializer the raw data, so I
moved the
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49147325
QA results for PR 1439:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49175346
Could you elaborate on when we will see an exception?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user concretevitamin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1439#discussion_r15013611
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -241,4 +252,37 @@ private[hive] object HadoopTableReader {
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49258632
@yhuai @concretevitamin thanks for the commenting, I've updated the
description in Jira, can you please jump there and take a look?
---
If your project is set
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49258678
Sorry, forgot to paste the link.
https://issues.apache.org/jira/browse/SPARK-2523
---
If your project is set up for it, you can reply to this email and have your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1439#issuecomment-49259426
QA tests have started for PR 1439. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16766/consoleFull
---
29 matches
Mail list logo