[GitHub] spark pull request #19445: Dataset select all columns

2017-10-06 Thread sohum2002
Github user sohum2002 closed the pull request at:

https://github.com/apache/spark/pull/19445


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19445: Dataset select all columns

2017-10-06 Thread sohum2002
GitHub user sohum2002 opened a pull request:

https://github.com/apache/spark/pull/19445

Dataset select all columns

The proposed two new additional functions is to help select all the columns 
in a Dataset except for given columns.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohum2002/spark dataset_selectAllColumns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19445


commit d35a1268d784a268e6137eff54eb8f83c981a289
Author: Burak Yavuz 
Date:   2017-02-01T00:52:53Z

[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics 
even if there is no new data in trigger

In StructuredStreaming, if a new trigger was skipped because no new data 
arrived, we suddenly report nothing for the metrics `stateOperator`. We could 
however easily report the metrics from `lastExecution` to ensure continuity of 
metrics.

Regression test in `StreamingQueryStatusAndProgressSuite`

Author: Burak Yavuz 

Closes #16716 from brkyvz/state-agg.

(cherry picked from commit 081b7addaf9560563af0ce25912972e91a78cee6)
Signed-off-by: Tathagata Das 

commit 61cdc8c7cc8cfc57646a30da0e0df874a14e3269
Author: Zheng RuiFeng 
Date:   2017-02-01T13:27:20Z

[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning

## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
``  ->   ``

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #16754 from zhengruifeng/doc_api_fix.

(cherry picked from commit 04ee8cf633e17b6bf95225a8dd77bf2e06980eb3)
Signed-off-by: Sean Owen 

commit f946464155bb907482dc8d8a1b0964a925d04081
Author: Devaraj K 
Date:   2017-02-01T20:55:11Z

[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED

## What changes were proposed in this pull request?

Copying of the killed status was missing while getting the newTaskInfo 
object by dropping the unnecessary details to reduce the memory usage. This 
patch adds the copying of the killed status to newTaskInfo object, this will 
correct the display of the status from wrong status to KILLED status in Web UI.

## How was this patch tested?

Current behaviour of displaying tasks in stage UI page,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|
|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|

Web UI display after applying the patch,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  | 0.0 B / 
0  | TaskKilled (killed intentionally)|
|156|11 |0  |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  |0.0 B / 
0   | TaskKilled (killed intentionally)|

Author: Devaraj K 

Closes #16725 from devaraj-kavali/SPARK-19377.

(cherry picked from commit df4a27cc5cae8e251ba2a883bcc5f5ce9282f649)
Signed-off-by: Shixiong Zhu 

commit 7c23bd49e826fc2b7f132ffac2e55a71905abe96
Author: Shixiong Zhu 
Date:   2017-02-02T05:39:21Z

[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout

## What changes were proposed in this pull request?

When connecting timeout, `ask` may fail with a confusing message:

```
17/02/01 23:15:19 INFO Worker: Connecting to master ...
java.lang.IllegalArgumentException: requirement failed: TransportClient has 
not yet been set.
at scala.Predef$.require(Predef.scala:224)
at