[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-15 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14214
  
@mariobriggs Thanks for the information!

> 1 can be eliminated because 'executedPlan' is a ' lazy val' on 
QueryExecution ?

Yea indeed. Its being there can provide us debug info but on second thought 
it might not be worth it. So let's also skip it in the case of `ForeachSink`.

> ... However this cannot be the solution since 
SparkListenerSQLExecutionStart is a public API already

Yea we probably do not want to modify this public API; so what we did in 
this patch was, passing [3]'s `incrementalExecution` into the listener so we'll 
only initialize physical planning only once for [2] and [3].

> ... why not keep [1] and the change to [2] be the simple case of changing 
L52 to the following: new Dataset(data.sparkSession, data.queryExecution, 
implicitly[Encoder[T]])

This is great. If reviews decide that 2 rounds of physical planning is 
acceptable, then let's do it your way!

> ... ConsoleSink ... but it is only for Debug purposes

So maybe let us live with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62368/consoleFull)**
 for PR 14214 at commit 
[`9334105`](https://github.com/apache/spark/commit/933410558429ea82f063f21236b8c5c645650a78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62362/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)**
 for PR 14214 at commit 
[`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread mariobriggs
Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/14214
  
What i tried to do as a 'side fix' was like this,
  eliminate [1] since it was a lazy val. 

  Move [2] out of the code path of the main thread i.e. let ListenerBus 
thread pay the penalty of producing the physical plan for logging ( i was 
coming from a performance test scenario, so it allowed me to proceed :-) ) . So 
the change was that SparkListenerSQLExecutionStart only take QueryExecution as 
a input parameter and not physicalPlanDescription & SparkPlanInfo . However 
this cannot be the solution since SparkListenerSQLExecutionStart is a public 
API already.

 [3] remains.

As you might have already noticed ConsoleSink also suffers from the same 
problem of [2] and these are inside Dataset.withTypedCallback/withCallback, but 
it is only for Debug purposes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread mariobriggs
Github user mariobriggs commented on the issue:

https://github.com/apache/spark/pull/14214
  
> [1] should not be eliminated in general;

  I dont understand the full internal aspects of IncrementalExecution, but 
my generally thinking was that 1 can be eliminated because 'executedPlan' is a 
' lazy val' on QueryExecution ?

>[2] is eliminated by this patch, by replacing the queryExecution with 
incrementalExecution provided by [3];

If the goal is to get it to just as minimal as possible for now and wait 
for SPARK-16264 (which i was also thinking where it will have to finally wait 
for full resolution), why not keep [1] and the change to [2] be the simple case 
of changing 
[L52](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L52)
 to the following
 
``` new Dataset(data.sparkSession, data.queryExecution, 
implicitly[Encoder[T]]) ```

and no further changes required to your ealier code. Will it be the case 
that the wrong physical plan will logged in SparkListenerSQLExecutionStart ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)**
 for PR 14214 at commit 
[`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org