[GitHub] spark pull request #19245: Add conda support for R

dansanduleac Fri, 15 Sep 2017 04:44:35 -0700

GitHub user dansanduleac opened a pull request:

    https://github.com/apache/spark/pull/19245


    Add conda support for R

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/palantir/spark ds/r-conda

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19245
    
----
commit f9f8e7bbe60ec8b6079e248c4df2c90db7f9d103
Author: Liang-Chi Hsieh <[email protected]>
Date:   2017-05-17T04:57:35Z

    [SPARK-20690][SQL] Subqueries in FROM should have alias names
    
    ## What changes were proposed in this pull request?
    
    We add missing attributes into Filter in Analyzer. But we shouldn't do it 
through subqueries like this:
    
        select 1 from  (select 1 from onerow t1 LIMIT 1) where  t1.c1=1
    
    This query works in current codebase. However, the outside where clause 
shouldn't be able to refer `t1.c1` attribute.
    
    The root cause is we allow subqueries in FROM have no alias names 
previously, it is confusing and isn't supported by various databases such as 
MySQL, Postgres, Oracle. We shouldn't support it too.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #17935 from viirya/SPARK-20690.

commit d73c48569308ea041bc45cc9a76116224283a280
Author: Josh Rosen <[email protected]>
Date:   2017-05-17T05:04:21Z

    [SPARK-20776] Fix perf. problems in JobProgressListener caused by 
TaskMetrics construction
    
    ## What changes were proposed in this pull request?
    
    In
    
    ```
    ./bin/spark-shell --master=local[64]
    ```
    
    I ran
    
    ```
    sc.parallelize(1 to 100000, 100000).count()
    ```
    and profiled the time spend in the LiveListenerBus event processing thread. 
I discovered that the majority of the time was being spent in 
`TaskMetrics.empty` calls in `JobProgressListener.onTaskStart`. It turns out 
that we can slightly refactor to remove the need to construct one empty 
instance per call, greatly improving the performance of this code.
    
    The performance gains here help to avoid an issue where listener events 
would be dropped because the JobProgressListener couldn't keep up with the 
throughput.
    
    **Before:**
    
    
![image](https://cloud.githubusercontent.com/assets/50748/26133095/95bcd42a-3a59-11e7-8051-a50550e447b8.png)
    
    **After:**
    
    
![image](https://cloud.githubusercontent.com/assets/50748/26133070/7935e148-3a59-11e7-8c2d-73d5aa5a2397.png)
    
    ## How was this patch tested?
    
    Benchmarks described above.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #18008 from JoshRosen/nametoaccums-improvements.

commit fe217717083ac0f7487fbf02e86805d8bb77f459
Author: Andrew Ray <[email protected]>
Date:   2017-05-17T09:06:01Z

    [SPARK-20769][DOC] Incorrect documentation for using Jupyter notebook
    
    ## What changes were proposed in this pull request?
    
    SPARK-13973 incorrectly removed the required 
PYSPARK_DRIVER_PYTHON_OPTS=notebook from documentation to use pyspark with 
Jupyter notebook. This patch corrects the documentation error.
    
    ## How was this patch tested?
    
    Tested invocation locally with
    ```bash
    PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook 
./bin/pyspark
    ```
    
    Author: Andrew Ray <[email protected]>
    
    Closes #18001 from aray/patch-1.

commit 9b32eeab20b36f168c107e7118901818bd99921d
Author: Shixiong Zhu <[email protected]>
Date:   2017-05-17T21:13:49Z

    [SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs
    
    ## What changes were proposed in this pull request?
    
    Executor task reaper may fail to detect if a task is finished or not when a 
task is finishing but being killed at the same time.
    
    The fix is pretty easy, just flip the "finished" flag when a task is 
successful.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #18021 from zsxwing/SPARK-20788.

commit 40ac8481893a4d3c2d14cd9d952587c2ce597264
Author: Shixiong Zhu <[email protected]>
Date:   2017-05-18T00:21:46Z

    [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready
    
    ## What changes were proposed in this pull request?
    
    Add `ThreadUtils.awaitReady` similar to `ThreadUtils.awaitResult` and 
disallow `Await.ready`.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #17763 from zsxwing/awaitready.

commit a28df1c27db8b13de6a6bd65115f0e65b4bb546e
Author: Yanbo Liang <[email protected]>
Date:   2017-05-18T03:54:09Z

    [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and 
ml.stat.ChiSquareTest.
    
    ## What changes were proposed in this pull request?
    Add docs and examples for ```ml.stat.Correlation``` and 
```ml.stat.ChiSquareTest```.
    
    ## How was this patch tested?
    Generate docs and run examples manually, successfully.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #17994 from yanboliang/spark-20505.

commit ef51e866568c83c7dbf88f61052b2a3434ebe7d9
Author: Xingbo Jiang <[email protected]>
Date:   2017-05-18T06:32:31Z

    [SPARK-20700][SQL] InferFiltersFromConstraints stackoverflows for query (v2)
    
    ## What changes were proposed in this pull request?
    
    In the previous approach we used `aliasMap` to link an `Attribute` to the 
expression with potentially the form `f(a, b)`, but we only searched the 
`expressions` and `children.expressions` for this, which is not enough when an 
`Alias` may lies deep in the logical plan. In that case, we can't generate the 
valid equivalent constraint classes and thus we fail at preventing the 
recursive deductions.
    
    We fix this problem by collecting all `Alias`s from the logical plan.
    
    ## How was this patch tested?
    
    No additional test case is added, but do modified one test case to cover 
this situation.
    
    Author: Xingbo Jiang <[email protected]>
    
    Closes #18020 from jiangxb1987/inferConstrants.

commit a3056767ff23c4fbc8c7716c07cb06a3761ea792
Author: Dan SÄnduleac <[email protected]>
Date:   2017-05-19T16:54:11Z

    Remove channels-as-args in installPackages too (#187)

commit cb4020123f4fd80b7d0af4686373a890a338e0a1
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-19T13:33:31Z

    Merge branch 'master' into rk/upstream

commit c4582656d3e5b151563db3c15133db4293cd36df
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-19T22:43:04Z

    [SPARK-20683] Revert recursive uncaching (#188)

commit 7f39dffc8dfc80fb63338c4284baf1d8fd01b60b
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-20T02:01:26Z

    Merge pull request #189 from palantir/rk/upstream
    
    Merge from upstream

commit 8d5ed79940b3abe319568a39adb8597693a4376f
Author: Dan SÄnduleac <[email protected]>
Date:   2017-05-22T20:08:27Z

    Ensure condaBinaryPath is executable when setting up 
CondaEnvironmentManager (+test) (#190)

commit fc6d8f37d91d8fd490f2e5f8a66135cc191a195f
Author: Glen Takahashi <[email protected]>
Date:   2017-05-27T01:17:41Z

    Only log formatted code in debug mode (#181)
    
    Don't log the full compiled code unless debug mode is on, to prevent ooming 
and crashing the spark driver in certain scenarios

commit 33b942e5e6d08e3558e6866caf6b02b51f4de738
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-31T11:09:44Z

    Force objenesis and beanutils versions (#194)

commit a2fc73454da59328e86b9811d3f482022aefa8f5
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-31T16:47:59Z

    Upgrade okhttp to 3.8.0 and okio to 1.13.0 (#197)

commit 372010ccec21f6c6a405e9d6f0668e34e9379529
Author: Robert Kruszewski <[email protected]>
Date:   2017-05-31T16:48:16Z

    Include influx sink by default (#195)

commit 533c2429cca3682522f6d3098ae7eee2952d6a83
Author: sjrand <[email protected]>
Date:   2017-06-01T12:06:43Z

    Hadoop 2.8.0 palantir5 (#196)

commit 951adda767ea88b8abb28264631dce26e5fe86bf
Author: Dan SÄnduleac <[email protected]>
Date:   2017-06-01T12:08:09Z

    More sane conda logging (to info!) and always include stderr in the 
exception (#198)

commit f115662621de6a9bbb0c0584fd290a8361c628d7
Author: Dan SÄnduleac <[email protected]>
Date:   2017-06-02T18:06:32Z

    Fix conda initialisation (#202)

commit 7b7d22a8c34f978e48b32d96e8fd3c72d2777c93
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-02T20:21:15Z

    [SPARK-20952] TaskContext as InheritableThreadLocal (#201)
    
    * fix test exception
    
    * inheritable thead locals

commit 4fa9588468fdd257d98a91c1ec298fb83b8b47dc
Author: Onur Satici <[email protected]>
Date:   2017-06-07T15:35:04Z

    Add publish-local script (#200)

commit 9bac7662c4b47499919ee1d7df6c77af6f36837a
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-07T21:46:43Z

    Update pom.xml (#203)

commit 7d0652f175f05a35993b6bfba5c88603349d777f
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-10T19:45:44Z

    Merge branch 'master' into rk/merge-upstream

commit 907708fe5cce4ed443813675d0746be784aed769
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-13T19:11:17Z

    Merge branch 'master' into rk/merge-upstream

commit 22b56ab0144490c8c32ea7d7533ab589b1f7edc9
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-13T19:21:16Z

    checkstyle

commit 5ba77a1c8174b9f03426a4314293c57279e65da2
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-13T21:55:28Z

    Merge pull request #204 from palantir/rk/merge-upstream
    
    Upstream merge

commit 04e1f99c43279d97c7fa84571838e43929de5cb1
Author: Robert Kruszewski <[email protected]>
Date:   2017-06-14T02:38:08Z

    Parquet bump to 1.9.1-palantir3 (#206)

commit 90fb2153c071c0b66c4e79922e05265fb368e4f4
Author: mccheah <[email protected]>
Date:   2017-04-21T06:15:24Z

    Staging server for receiving application dependencies.
    
    * Staging server for receiving application dependencies.
    
    * Add unit test for file writing
    
    * Minor fixes
    
    * Remove getting credentials from the API
    
    We still want to post them because in the future we can use these
    credentials to monitor the API server and handle cleaning up the data
    accordingly.
    
    * Generalize to resource staging server outside of Spark
    
    * Update code documentation
    
    * Val instead of var
    
    * Fix naming, remove unused import
    
    * Move suites from integration test package to core
    
    * Use TrieMap instead of locks
    
    * Address comments
    
    * Fix imports
    
    * Change paths, use POST instead of PUT
    
    * Use a resource identifier as well as a resource secret
    
    (cherry picked from commit 3f6e5ead760bca82c3af070d4d1535511bc6468a)

commit 8b0bf1386003b1859975c0d23e1ed95b36735379
Author: mccheah <[email protected]>
Date:   2017-04-21T07:34:27Z

    Reorganize packages between v1 work and v2 work
    
    * Staging server for receiving application dependencies.
    
    * Move packages around to split between v1 work and v2 work
    
    * Add unit test for file writing
    
    * Remove unnecessary main
    
    * Add back license header
    
    * Minor fixes
    
    * Fix integration test with renamed package for client. Fix scalastyle.
    
    * Force json serialization to consider the different package.
    
    * Revert extraneous log
    
    * Fix scalastyle
    
    * Remove getting credentials from the API
    
    We still want to post them because in the future we can use these
    credentials to monitor the API server and handle cleaning up the data
    accordingly.
    
    * Generalize to resource staging server outside of Spark
    
    * Update code documentation
    
    * Val instead of var
    
    * Fix build
    
    * Fix naming, remove unused import
    
    * Move suites from integration test package to core
    
    * Use TrieMap instead of locks
    
    * Address comments
    
    * Fix imports
    
    * Change paths, use POST instead of PUT
    
    * Use a resource identifier as well as a resource secret
    
    (cherry picked from commit e24c4af93c2cff29fb91bb2641ea70db3a22ffa0)
    
     Conflicts:
        dev/.rat-excludes
        
resource-managers/kubernetes/core/src/main/resources/META-INF/services/org.apache.spark.deploy.rest.kubernetes.DriverServiceManager

commit a0103982771348a5b1a5bab8aa80df0fa1cf1c23
Author: mccheah <[email protected]>
Date:   2017-04-21T09:20:26Z

    Support SSL on the file staging server
    
    * Staging server for receiving application dependencies.
    
    * Move packages around to split between v1 work and v2 work
    
    * Add unit test for file writing
    
    * Remove unnecessary main
    
    * Allow the file staging server to be secured with TLS.
    
    * Add back license header
    
    * Minor fixes
    
    * Fix integration test with renamed package for client. Fix scalastyle.
    
    * Remove unused import
    
    * Force json serialization to consider the different package.
    
    * Revert extraneous log
    
    * Fix scalastyle
    
    * Remove getting credentials from the API
    
    We still want to post them because in the future we can use these
    credentials to monitor the API server and handle cleaning up the data
    accordingly.
    
    * Fix build
    
    * Randomize name and namespace in test to prevent collisions
    
    * Generalize to resource staging server outside of Spark
    
    * Update code documentation
    
    * Val instead of var
    
    * Fix unit tests.
    
    * Fix build
    
    * Fix naming, remove unused import
    
    * Move suites from integration test package to core
    
    * Fix unit test
    
    * Use TrieMap instead of locks
    
    * Address comments
    
    * Fix imports
    
    * Address comments
    
    * Change main object name
    
    * Change config variable names
    
    * Change paths, use POST instead of PUT
    
    * Use a resource identifier as well as a resource secret
    
    (cherry picked from commit 4940eae3f78c3a7f6eebc55a24e00b066dff22bc)

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19245: Add conda support for R

Reply via email to