[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-27 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15438771 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435898 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +251,37 @@ private[hive] object HadoopTableReader { val

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-50248847 Also, can you delete `[WIP]` from the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49883931 @yhuai @concretevitamin @marmbrus @liancheng Can you take a look of this? I think the test result gave us more confidence for the improvement. --- If your

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-21 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49574159 Thank you guys, I've updated the code as suggested, and the also attached the micro-benchmark result in the PR description. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49577438 QA results for PR 1439:br- This patch FAILED unit tests.brbrFor more information see test

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49578517 QA results for PR 1439:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49573309 QA tests have started for PR 1439. This patch DID NOT merge cleanly! brView progress:

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49573615 QA tests have started for PR 1439. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16897/consoleFull ---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-19 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49501954 As for benchmarks, the micro benchmark code comes with #758 may be helpful. And I feel that partitioning support for Parquet should be considered together with the

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-18 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49483743 I'll just add the the `HiveTableReader` vs `HiveTableScan` separation is purely artificial, and the split is based on what code was stolen from Shark vs what code was

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49265512 QA results for PR 1439:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15067812 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala --- @@ -67,95 +61,12 @@ case class HiveTableScan( }

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15068484 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -156,33 +158,43 @@ class HadoopTableReader(@transient _tableDesc:

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49338675 I think we are not clear on the boundary between a `TableReader` and a physical `TableScan` operator (e.g. `HiveTableScan`). Seems we just want `TableReader` to create

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49341477 @chenghao-intel explained the root cause in https://issues.apache.org/jira/browse/SPARK-2523. Basically, we should use partition-specific `ObjectInspectors` to extract

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15092652 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala --- @@ -67,95 +61,12 @@ case class HiveTableScan( }

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15092708 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -156,33 +158,43 @@ class HadoopTableReader(@transient _tableDesc:

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49386103 @yhuai I agree with you we should make a clear boundary between `HiveTableScan` and `TableReader`, but I am not sure if it's a good idea to create multiple

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49386783 @chenghao-intel I did not meant to introduce multiple `HiveTableScan`. I meant to have a abstract `TableScan` and make existing ones (e.g. `HiveTableScan` and

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49387608 @yhuai sorry if I misundertood. Do you mean the `HiveTableScan` `ParquetTableScan` is the new operators, which created by SparkPlanner, right? --- If your

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49388234 @yhuai I got your mean eventually, I think you're right, some of the logic could be shared among TableScan operators. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49138992 `ObjectInspector` is not required by `Row` in Catalyst any more (not like in Shark), and it is tightly coupled with Deserializer the raw data, so I moved the

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49147325 QA results for PR 1439:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49175346 Could you elaborate on when we will see an exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15013611 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +252,37 @@ private[hive] object HadoopTableReader {

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49258632 @yhuai @concretevitamin thanks for the commenting, I've updated the description in Jira, can you please jump there and take a look? --- If your project is set

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49258678 Sorry, forgot to paste the link. https://issues.apache.org/jira/browse/SPARK-2523 --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49259426 QA tests have started for PR 1439. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16766/consoleFull ---