GitHub user dansanduleac opened a pull request:
https://github.com/apache/spark/pull/19245
Add conda support for R
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/palantir/spark ds/r-conda
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19245.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19245
----
commit f9f8e7bbe60ec8b6079e248c4df2c90db7f9d103
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-05-17T04:57:35Z
[SPARK-20690][SQL] Subqueries in FROM should have alias names
## What changes were proposed in this pull request?
We add missing attributes into Filter in Analyzer. But we shouldn't do it
through subqueries like this:
select 1 from (select 1 from onerow t1 LIMIT 1) where t1.c1=1
This query works in current codebase. However, the outside where clause
shouldn't be able to refer `t1.c1` attribute.
The root cause is we allow subqueries in FROM have no alias names
previously, it is confusing and isn't supported by various databases such as
MySQL, Postgres, Oracle. We shouldn't support it too.
## How was this patch tested?
Jenkins tests.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
Author: Liang-Chi Hsieh <[email protected]>
Closes #17935 from viirya/SPARK-20690.
commit d73c48569308ea041bc45cc9a76116224283a280
Author: Josh Rosen <[email protected]>
Date: 2017-05-17T05:04:21Z
[SPARK-20776] Fix perf. problems in JobProgressListener caused by
TaskMetrics construction
## What changes were proposed in this pull request?
In
```
./bin/spark-shell --master=local[64]
```
I ran
```
sc.parallelize(1 to 100000, 100000).count()
```
and profiled the time spend in the LiveListenerBus event processing thread.
I discovered that the majority of the time was being spent in
`TaskMetrics.empty` calls in `JobProgressListener.onTaskStart`. It turns out
that we can slightly refactor to remove the need to construct one empty
instance per call, greatly improving the performance of this code.
The performance gains here help to avoid an issue where listener events
would be dropped because the JobProgressListener couldn't keep up with the
throughput.
**Before:**

**After:**

## How was this patch tested?
Benchmarks described above.
Author: Josh Rosen <[email protected]>
Closes #18008 from JoshRosen/nametoaccums-improvements.
commit fe217717083ac0f7487fbf02e86805d8bb77f459
Author: Andrew Ray <[email protected]>
Date: 2017-05-17T09:06:01Z
[SPARK-20769][DOC] Incorrect documentation for using Jupyter notebook
## What changes were proposed in this pull request?
SPARK-13973 incorrectly removed the required
PYSPARK_DRIVER_PYTHON_OPTS=notebook from documentation to use pyspark with
Jupyter notebook. This patch corrects the documentation error.
## How was this patch tested?
Tested invocation locally with
```bash
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook
./bin/pyspark
```
Author: Andrew Ray <[email protected]>
Closes #18001 from aray/patch-1.
commit 9b32eeab20b36f168c107e7118901818bd99921d
Author: Shixiong Zhu <[email protected]>
Date: 2017-05-17T21:13:49Z
[SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs
## What changes were proposed in this pull request?
Executor task reaper may fail to detect if a task is finished or not when a
task is finishing but being killed at the same time.
The fix is pretty easy, just flip the "finished" flag when a task is
successful.
## How was this patch tested?
Jenkins
Author: Shixiong Zhu <[email protected]>
Closes #18021 from zsxwing/SPARK-20788.
commit 40ac8481893a4d3c2d14cd9d952587c2ce597264
Author: Shixiong Zhu <[email protected]>
Date: 2017-05-18T00:21:46Z
[SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready
## What changes were proposed in this pull request?
Add `ThreadUtils.awaitReady` similar to `ThreadUtils.awaitResult` and
disallow `Await.ready`.
## How was this patch tested?
Jenkins
Author: Shixiong Zhu <[email protected]>
Closes #17763 from zsxwing/awaitready.
commit a28df1c27db8b13de6a6bd65115f0e65b4bb546e
Author: Yanbo Liang <[email protected]>
Date: 2017-05-18T03:54:09Z
[SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and
ml.stat.ChiSquareTest.
## What changes were proposed in this pull request?
Add docs and examples for ```ml.stat.Correlation``` and
```ml.stat.ChiSquareTest```.
## How was this patch tested?
Generate docs and run examples manually, successfully.
Author: Yanbo Liang <[email protected]>
Closes #17994 from yanboliang/spark-20505.
commit ef51e866568c83c7dbf88f61052b2a3434ebe7d9
Author: Xingbo Jiang <[email protected]>
Date: 2017-05-18T06:32:31Z
[SPARK-20700][SQL] InferFiltersFromConstraints stackoverflows for query (v2)
## What changes were proposed in this pull request?
In the previous approach we used `aliasMap` to link an `Attribute` to the
expression with potentially the form `f(a, b)`, but we only searched the
`expressions` and `children.expressions` for this, which is not enough when an
`Alias` may lies deep in the logical plan. In that case, we can't generate the
valid equivalent constraint classes and thus we fail at preventing the
recursive deductions.
We fix this problem by collecting all `Alias`s from the logical plan.
## How was this patch tested?
No additional test case is added, but do modified one test case to cover
this situation.
Author: Xingbo Jiang <[email protected]>
Closes #18020 from jiangxb1987/inferConstrants.
commit a3056767ff23c4fbc8c7716c07cb06a3761ea792
Author: Dan SÄnduleac <[email protected]>
Date: 2017-05-19T16:54:11Z
Remove channels-as-args in installPackages too (#187)
commit cb4020123f4fd80b7d0af4686373a890a338e0a1
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-19T13:33:31Z
Merge branch 'master' into rk/upstream
commit c4582656d3e5b151563db3c15133db4293cd36df
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-19T22:43:04Z
[SPARK-20683] Revert recursive uncaching (#188)
commit 7f39dffc8dfc80fb63338c4284baf1d8fd01b60b
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-20T02:01:26Z
Merge pull request #189 from palantir/rk/upstream
Merge from upstream
commit 8d5ed79940b3abe319568a39adb8597693a4376f
Author: Dan SÄnduleac <[email protected]>
Date: 2017-05-22T20:08:27Z
Ensure condaBinaryPath is executable when setting up
CondaEnvironmentManager (+test) (#190)
commit fc6d8f37d91d8fd490f2e5f8a66135cc191a195f
Author: Glen Takahashi <[email protected]>
Date: 2017-05-27T01:17:41Z
Only log formatted code in debug mode (#181)
Don't log the full compiled code unless debug mode is on, to prevent ooming
and crashing the spark driver in certain scenarios
commit 33b942e5e6d08e3558e6866caf6b02b51f4de738
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-31T11:09:44Z
Force objenesis and beanutils versions (#194)
commit a2fc73454da59328e86b9811d3f482022aefa8f5
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-31T16:47:59Z
Upgrade okhttp to 3.8.0 and okio to 1.13.0 (#197)
commit 372010ccec21f6c6a405e9d6f0668e34e9379529
Author: Robert Kruszewski <[email protected]>
Date: 2017-05-31T16:48:16Z
Include influx sink by default (#195)
commit 533c2429cca3682522f6d3098ae7eee2952d6a83
Author: sjrand <[email protected]>
Date: 2017-06-01T12:06:43Z
Hadoop 2.8.0 palantir5 (#196)
commit 951adda767ea88b8abb28264631dce26e5fe86bf
Author: Dan SÄnduleac <[email protected]>
Date: 2017-06-01T12:08:09Z
More sane conda logging (to info!) and always include stderr in the
exception (#198)
commit f115662621de6a9bbb0c0584fd290a8361c628d7
Author: Dan SÄnduleac <[email protected]>
Date: 2017-06-02T18:06:32Z
Fix conda initialisation (#202)
commit 7b7d22a8c34f978e48b32d96e8fd3c72d2777c93
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-02T20:21:15Z
[SPARK-20952] TaskContext as InheritableThreadLocal (#201)
* fix test exception
* inheritable thead locals
commit 4fa9588468fdd257d98a91c1ec298fb83b8b47dc
Author: Onur Satici <[email protected]>
Date: 2017-06-07T15:35:04Z
Add publish-local script (#200)
commit 9bac7662c4b47499919ee1d7df6c77af6f36837a
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-07T21:46:43Z
Update pom.xml (#203)
commit 7d0652f175f05a35993b6bfba5c88603349d777f
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-10T19:45:44Z
Merge branch 'master' into rk/merge-upstream
commit 907708fe5cce4ed443813675d0746be784aed769
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-13T19:11:17Z
Merge branch 'master' into rk/merge-upstream
commit 22b56ab0144490c8c32ea7d7533ab589b1f7edc9
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-13T19:21:16Z
checkstyle
commit 5ba77a1c8174b9f03426a4314293c57279e65da2
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-13T21:55:28Z
Merge pull request #204 from palantir/rk/merge-upstream
Upstream merge
commit 04e1f99c43279d97c7fa84571838e43929de5cb1
Author: Robert Kruszewski <[email protected]>
Date: 2017-06-14T02:38:08Z
Parquet bump to 1.9.1-palantir3 (#206)
commit 90fb2153c071c0b66c4e79922e05265fb368e4f4
Author: mccheah <[email protected]>
Date: 2017-04-21T06:15:24Z
Staging server for receiving application dependencies.
* Staging server for receiving application dependencies.
* Add unit test for file writing
* Minor fixes
* Remove getting credentials from the API
We still want to post them because in the future we can use these
credentials to monitor the API server and handle cleaning up the data
accordingly.
* Generalize to resource staging server outside of Spark
* Update code documentation
* Val instead of var
* Fix naming, remove unused import
* Move suites from integration test package to core
* Use TrieMap instead of locks
* Address comments
* Fix imports
* Change paths, use POST instead of PUT
* Use a resource identifier as well as a resource secret
(cherry picked from commit 3f6e5ead760bca82c3af070d4d1535511bc6468a)
commit 8b0bf1386003b1859975c0d23e1ed95b36735379
Author: mccheah <[email protected]>
Date: 2017-04-21T07:34:27Z
Reorganize packages between v1 work and v2 work
* Staging server for receiving application dependencies.
* Move packages around to split between v1 work and v2 work
* Add unit test for file writing
* Remove unnecessary main
* Add back license header
* Minor fixes
* Fix integration test with renamed package for client. Fix scalastyle.
* Force json serialization to consider the different package.
* Revert extraneous log
* Fix scalastyle
* Remove getting credentials from the API
We still want to post them because in the future we can use these
credentials to monitor the API server and handle cleaning up the data
accordingly.
* Generalize to resource staging server outside of Spark
* Update code documentation
* Val instead of var
* Fix build
* Fix naming, remove unused import
* Move suites from integration test package to core
* Use TrieMap instead of locks
* Address comments
* Fix imports
* Change paths, use POST instead of PUT
* Use a resource identifier as well as a resource secret
(cherry picked from commit e24c4af93c2cff29fb91bb2641ea70db3a22ffa0)
Conflicts:
dev/.rat-excludes
resource-managers/kubernetes/core/src/main/resources/META-INF/services/org.apache.spark.deploy.rest.kubernetes.DriverServiceManager
commit a0103982771348a5b1a5bab8aa80df0fa1cf1c23
Author: mccheah <[email protected]>
Date: 2017-04-21T09:20:26Z
Support SSL on the file staging server
* Staging server for receiving application dependencies.
* Move packages around to split between v1 work and v2 work
* Add unit test for file writing
* Remove unnecessary main
* Allow the file staging server to be secured with TLS.
* Add back license header
* Minor fixes
* Fix integration test with renamed package for client. Fix scalastyle.
* Remove unused import
* Force json serialization to consider the different package.
* Revert extraneous log
* Fix scalastyle
* Remove getting credentials from the API
We still want to post them because in the future we can use these
credentials to monitor the API server and handle cleaning up the data
accordingly.
* Fix build
* Randomize name and namespace in test to prevent collisions
* Generalize to resource staging server outside of Spark
* Update code documentation
* Val instead of var
* Fix unit tests.
* Fix build
* Fix naming, remove unused import
* Move suites from integration test package to core
* Fix unit test
* Use TrieMap instead of locks
* Address comments
* Fix imports
* Address comments
* Change main object name
* Change config variable names
* Change paths, use POST instead of PUT
* Use a resource identifier as well as a resource secret
(cherry picked from commit 4940eae3f78c3a7f6eebc55a24e00b066dff22bc)
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]