GitHub user sohum2002 opened a pull request:
https://github.com/apache/spark/pull/19445
Dataset select all columns
The proposed two new additional functions is to help select all the columns
in a Dataset except for given columns.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sohum2002/spark dataset_selectAllColumns
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19445.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19445
commit d35a1268d784a268e6137eff54eb8f83c981a289
Author: Burak Yavuz
Date: 2017-02-01T00:52:53Z
[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics
even if there is no new data in trigger
In StructuredStreaming, if a new trigger was skipped because no new data
arrived, we suddenly report nothing for the metrics `stateOperator`. We could
however easily report the metrics from `lastExecution` to ensure continuity of
metrics.
Regression test in `StreamingQueryStatusAndProgressSuite`
Author: Burak Yavuz
Closes #16716 from brkyvz/state-agg.
(cherry picked from commit 081b7addaf9560563af0ce25912972e91a78cee6)
Signed-off-by: Tathagata Das
commit 61cdc8c7cc8cfc57646a30da0e0df874a14e3269
Author: Zheng RuiFeng
Date: 2017-02-01T13:27:20Z
[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning
## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
`` -> ``
## How was this patch tested?
manual tests
Author: Zheng RuiFeng
Closes #16754 from zhengruifeng/doc_api_fix.
(cherry picked from commit 04ee8cf633e17b6bf95225a8dd77bf2e06980eb3)
Signed-off-by: Sean Owen
commit f946464155bb907482dc8d8a1b0964a925d04081
Author: Devaraj K
Date: 2017-02-01T20:55:11Z
[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED
## What changes were proposed in this pull request?
Copying of the killed status was missing while getting the newTaskInfo
object by dropping the unnecessary details to reduce the memory usage. This
patch adds the copying of the killed status to newTaskInfo object, this will
correct the display of the status from wrong status to KILLED status in Web UI.
## How was this patch tested?
Current behaviour of displaying tasks in stage UI page,
| Index | ID | Attempt | Status | Locality Level | Executor ID / Host |
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
--- |
|143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 |
|0.0 B / 0|TaskKilled (killed intentionally)|
|156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 |
|0.0 B / 0|TaskKilled (killed intentionally)|
Web UI display after applying the patch,
| Index | ID | Attempt | Status | Locality Level | Executor ID / Host |
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
--- |
|143|10 |0 |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | | 0.0 B /
0 | TaskKilled (killed intentionally)|
|156|11 |0 |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B /
0 | TaskKilled (killed intentionally)|
Author: Devaraj K
Closes #16725 from devaraj-kavali/SPARK-19377.
(cherry picked from commit df4a27cc5cae8e251ba2a883bcc5f5ce9282f649)
Signed-off-by: Shixiong Zhu
commit 7c23bd49e826fc2b7f132ffac2e55a71905abe96
Author: Shixiong Zhu
Date: 2017-02-02T05:39:21Z
[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout
## What changes were proposed in this pull request?
When connecting timeout, `ask` may fail with a confusing message:
```
17/02/01 23:15:19 INFO Worker: Connecting to master ...
java.lang.IllegalArgumentException: requirement failed: TransportClient has
not yet been set.
at scala.Predef$.require(Predef.scala:224)
at