[GitHub] spark pull request: Aditional information on build from source

igorcosta Thu, 19 Mar 2015 11:34:52 -0700

GitHub user igorcosta opened a pull request:

    https://github.com/apache/spark/pull/5091


    Aditional information on build from source

    There's a substantial missing info from get's started. So adding more 
options to build from it source code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/igorcosta/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5091
    
----
commit 4a17eedb16343413e5b6f8bb58c6da8952ee7ab6
Author: Joseph K. Bradley <[email protected]>
Date:   2015-02-20T10:31:32Z

    [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release
    
    For SPARK-5867:
    * The spark.ml programming guide needs to be updated to use the new SQL 
DataFrame API instead of the old SchemaRDD API.
    * It should also include Python examples now.
    
    For SPARK-5892:
    * Fix Python docs
    * Various other cleanups
    
    BTW, I accidentally merged this with master.  If you want to compile it on 
your own, use this branch which is based on spark/branch-1.3 and cherry-picks 
the commits from this PR: 
[https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
    
    CC: mengxr  (ML),  davies  (Python docs)
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #4675 from jkbradley/doc-review-1.3 and squashes the following 
commits:
    
    f191bb0 [Joseph K. Bradley] small cleanups
    e786efa [Joseph K. Bradley] small doc corrections
    6b1ab4a [Joseph K. Bradley] fixed python lint test
    946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  
Changed spark.ml Java examples to use DataFrames API instead of sql()
    da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' 
into doc-review-1.3
    629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page 
for old migration guides * small fixes * moved inherit_doc in python
    b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for 
string interpolation
    34b067f [Joseph K. Bradley] small doc correction
    da16aef [Joseph K. Bradley] Fixed python mllib docs
    8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
    695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class 
hierarchies in python docs
    a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
    b05a80d [Joseph K. Bradley] organize imports. doc cleanups
    e572827 [Joseph K. Bradley] updated programming guide for ml and mllib

commit 5b0a42cb17b840c82d3f8a5ad061d99e261ceadf
Author: Davies Liu <[email protected]>
Date:   2015-02-20T23:35:05Z

    [SPARK-5898] [SPARK-5896] [SQL]  [PySpark] create DataFrame from pandas and 
tuple/list
    
    Fix createDataFrame() from pandas DataFrame (not tested by jenkins, depends 
on SPARK-5693).
    
    It also support to create DataFrame from plain tuple/list without column 
names, `_1`, `_2` will be used as column names.
    
    Author: Davies Liu <[email protected]>
    
    Closes #4679 from davies/pandas and squashes the following commits:
    
    c0cbe0b [Davies Liu] fix tests
    8466d1d [Davies Liu] fix create DataFrame from pandas

commit e155324711740da97698b93526128b0eae2dc0ce
Author: Jacky Li <[email protected]>
Date:   2015-02-21T13:00:16Z

    [MLlib] fix typo
    
    fix typo: it should be "default:" instead of "default;"
    
    Author: Jacky Li <[email protected]>
    
    Closes #4713 from jackylk/patch-10 and squashes the following commits:
    
    15daf2e [Jacky Li] [MLlib] fix typo

commit d3cbd38c33e6a2addcf8caa18eeb10036fbfd01b
Author: Nishkam Ravi <[email protected]>
Date:   2015-02-21T17:59:28Z

    SPARK-5841 [CORE] [HOTFIX 2] Memory leak in DiskBlockManager
    
    Continue to see IllegalStateException in YARN cluster mode. Adding a simple 
workaround for now.
    
    Author: Nishkam Ravi <[email protected]>
    Author: nishkamravi2 <[email protected]>
    Author: nravi <[email protected]>
    
    Closes #4690 from nishkamravi2/master_nravi and squashes the following 
commits:
    
    d453197 [nishkamravi2] Update NewHadoopRDD.scala
    6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
    0ce2c32 [nishkamravi2] Update HadoopRDD.scala
    f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of 
https://github.com/nishkamravi2/spark into master_nravi
    ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of 
removeShutDownHook. Deletion of semi-redundant occurrences of expensive 
operation inShutDown.
    71d0e17 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    494d8c0 [nishkamravi2] Update DiskBlockManager.scala
    3c5ddba [nishkamravi2] Update DiskBlockManager.scala
    f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by 
recent changes to BlockManager.stop
    79ea8b4 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    b446edc [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
    535295a [nishkamravi2] Update TaskSetManager.scala
    3e1b616 [Nishkam Ravi] Modify test for maxResultSize
    9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message 
and add condition to check if maxResultSize > 0)
    5f8f9ed [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    636a9ff [nishkamravi2] Update YarnAllocator.scala
    8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
    35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
    5ac2ec1 [Nishkam Ravi] Remove out
    dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead 
issue
    42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
    362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
    c726bd9 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
    1cf2d1e [nishkamravi2] Update YarnAllocator.scala
    ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an 
additive constant to a multiplier (redone to resolve merge conflicts)
    2e69f11 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    efd688a [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark
    2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int 
value, to be consistent with rest of Spark
    3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
    5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
    eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
    df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, 
Hadoop-10456)
    6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
    5108700 [nravi] Fix in Spark for the Concurrent thread modification issue 
(SPARK-1097, HADOOP-10456)
    681b36f [nravi] Fix for SPARK-1758: failing test 
org.apache.spark.JavaAPISuite.wholeTextFiles

commit 7138816abe1060a1e967c4c77c72d5752586d557
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-21T18:01:01Z

    [SPARK-5937][YARN] Fix ClientSuite to set YARN mode, so that the correct 
class is used in t...
    
    ...ests.
    
    Without this SparkHadoopUtil is used by the Client instead of 
YarnSparkHadoopUtil.
    
    Author: Hari Shreedharan <[email protected]>
    
    Closes #4711 from harishreedharan/SPARK-5937 and squashes the following 
commits:
    
    d154de6 [Hari Shreedharan] Use System.clearProperty() instead of setting 
the value of SPARK_YARN_MODE to empty string.
    f729f70 [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the 
correct class is used in tests.

commit 7683982faf920b8ac6cf46b79842450e7d46c5cc
Author: Evan Yu <[email protected]>
Date:   2015-02-21T20:40:21Z

    [SPARK-5860][CORE] JdbcRDD: overflow on large range with high number of 
partitions
    
    Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT 
ids
    
    Author: Evan Yu <[email protected]>
    
    Closes #4701 from hotou/SPARK-5860 and squashes the following commits:
    
    9e038d1 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
    7883ad9 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
    c88755a [Evan Yu] [SPARK-5860][CORE] switch to BigInt instead of BigDecimal
    4e9ff4f [Evan Yu] [SPARK-5860][CORE] JdbcRDD overflow on large range with 
high number of partitions

commit 46462ff255b0eef8263ed798f3d5aeb8460ecaf1
Author: Patrick Wendell <[email protected]>
Date:   2015-02-22T07:07:30Z

    MAINTENANCE: Automated closing of pull requests.
    
    This commit exists to close the following pull requests on Github:
    
    Closes #3490 (close requested by 'andrewor14')
    Closes #4646 (close requested by 'srowen')
    Closes #3591 (close requested by 'andrewor14')
    Closes #3656 (close requested by 'andrewor14')
    Closes #4553 (close requested by 'JoshRosen')
    Closes #4202 (close requested by 'srowen')
    Closes #4497 (close requested by 'marmbrus')
    Closes #4150 (close requested by 'andrewor14')
    Closes #2409 (close requested by 'andrewor14')
    Closes #4221 (close requested by 'srowen')

commit a7f90390251ff62a0e10edf4c2eb876538597791
Author: Alexander <[email protected]>
Date:   2015-02-22T08:53:05Z

    [DOCS] Fix typo in API for custom InputFormats based on the ânewâ 
MapReduce API
    
    This looks like a simple typo ```SparkContext.newHadoopRDD``` instead of 
```SparkContext.newAPIHadoopRDD``` as in actual 
http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.SparkContext
    
    Author: Alexander <[email protected]>
    
    Closes #4718 from bzz/hadoop-InputFormats-doc-fix and squashes the 
following commits:
    
    680a4c4 [Alexander] Fix typo in docs on custom Hadoop InputFormats

commit 275b1bef897d775f1f7743378ca3e09e36160136
Author: Cheng Hao <[email protected]>
Date:   2015-02-22T08:56:30Z

    [DataFrame] [Typo] Fix the typo
    
    Author: Cheng Hao <[email protected]>
    
    Closes #4717 from chenghao-intel/typo1 and squashes the following commits:
    
    858d7b0 [Cheng Hao] update the typo

commit e4f9d03d728bc6fbfb6ebc7d15b4ba328f98f3dc
Author: Aaron Josephs <[email protected]>
Date:   2015-02-23T06:09:06Z

    [SPARK-911] allow efficient queries for a range if RDD is partitioned wi...
    
    ...th RangePartitioner
    
    Author: Aaron Josephs <[email protected]>
    
    Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits:
    
    e30ade5 [Aaron Josephs] [SPARK-911] allow efficient queries for a range if 
RDD is partitioned with RangePartitioner

commit 95cd643aa954b7e4229e94fa8bdc99bf3b2bb1da
Author: Ilya Ganelin <[email protected]>
Date:   2015-02-23T06:43:04Z

    [SPARK-3885] Provide mechanism to remove accumulators once they are no 
longer used
    
    Instead of storing a strong reference to accumulators, I've replaced this 
with a weak reference and updated any code that uses these accumulators to 
check whether the reference resolves before using the accumulator. A weak 
reference will be cleared when there is no longer an existing copy of the 
variable versus using a soft reference in which case accumulators would only be 
cleared when the GC explicitly ran out of memory.
    
    Author: Ilya Ganelin <[email protected]>
    
    Closes #4021 from ilganeli/SPARK-3885 and squashes the following commits:
    
    4ba9575 [Ilya Ganelin]  Fixed error in test suite
    8510943 [Ilya Ganelin] Extra code
    bb76ef0 [Ilya Ganelin] File deleted somehow
    283a333 [Ilya Ganelin] Added cleanup method for accumulators to remove 
stale references within Accumulators.original to accumulators that are now out 
of scope
    345fd4f [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-3885
    7485a82 [Ilya Ganelin] Fixed build error
    c8e0f2b [Ilya Ganelin] Added working test for accumulator garbage collection
    94ce754 [Ilya Ganelin] Still not being properly garbage collected
    8722b63 [Ilya Ganelin] Fixing gc test
    7414a9c [Ilya Ganelin] Added test for accumulator garbage collection
    18d62ec [Ilya Ganelin] Updated to throw Exception when accessing a GCd 
accumulator
    9a81928 [Ilya Ganelin] Reverting permissions changes
    28f705c [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-3885
    b820ab4b [Ilya Ganelin] reset
    d78f4bf [Ilya Ganelin] Removed obsolete comment
    0746e61 [Ilya Ganelin] Updated DAGSchedulerSUite to fix bug
    3350852 [Ilya Ganelin] Updated DAGScheduler and Suite to correctly use new 
implementation of WeakRef Accumulator storage
    c49066a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-3885
    cbb9023 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-3885
    a77d11b [Ilya Ganelin] Updated Accumulators class to store weak references 
instead of strong references to allow garbage collection of old accumulators

commit 934876741683fc254fed18e7ff630614f78944be
Author: Makoto Fukuhara <[email protected]>
Date:   2015-02-23T09:24:33Z

    [EXAMPLES] fix typo.
    
    Author: Makoto Fukuhara <[email protected]>
    
    Closes #4724 from fukuo33/fix-typo and squashes the following commits:
    
    8c806b9 [Makoto Fukuhara] fix typo.

commit 757b14b862a1d39c1bad7b321dae1a3ea8338fbb
Author: Saisai Shao <[email protected]>
Date:   2015-02-23T11:27:27Z

    [SPARK-5943][Streaming] Update the test to use new API to reduce the warning
    
    Author: Saisai Shao <[email protected]>
    
    Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
    
    1b01233 [Saisai Shao] Update the test to use new API to reduce the warning

commit 242d49584c6aa21d928db2552033661950f760a5
Author: CodingCat <[email protected]>
Date:   2015-02-23T11:29:25Z

    [SPARK-5724] fix the misconfiguration in AkkaUtils
    
    https://issues.apache.org/jira/browse/SPARK-5724
    
    In AkkaUtil, we set several failure detector related the parameters as 
following
    
    ```
    al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
          .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
          s"""
          |akka.daemonic = on
          |akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
          |akka.stdout-loglevel = "ERROR"
          |akka.jvm-exit-on-fatal-error = off
          |akka.remote.require-cookie = "$requireCookie"
          |akka.remote.secure-cookie = "$secureCookie"
          |akka.remote.transport-failure-detector.heartbeat-interval = 
$akkaHeartBeatInterval s
          |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = 
$akkaHeartBeatPauses s
          |akka.remote.transport-failure-detector.threshold = 
$akkaFailureDetector
          |akka.actor.provider = "akka.remote.RemoteActorRefProvider"
          |akka.remote.netty.tcp.transport-class = 
"akka.remote.transport.netty.NettyTransport"
          |akka.remote.netty.tcp.hostname = "$host"
          |akka.remote.netty.tcp.port = $port
          |akka.remote.netty.tcp.tcp-nodelay = on
          |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
          |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
          |akka.remote.netty.tcp.execution-pool-size = $akkaThreads
          |akka.actor.default-dispatcher.throughput = $akkaBatchSize
          |akka.log-config-on-start = $logAkkaConfig
          |akka.remote.log-remote-lifecycle-events = $lifecycleEvents
          |akka.log-dead-letters = $lifecycleEvents
          |akka.log-dead-letters-during-shutdown = $lifecycleEvents
          """.stripMargin))
    
    ```
    
    Actually, we do not have any parameter naming 
"akka.remote.transport-failure-detector.threshold"
    see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
    what we have is "akka.remote.watch-failure-detector.threshold"
    
    Author: CodingCat <[email protected]>
    
    Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
    
    bafe56e [CodingCat] fix the grammar in configuration doc
    338296e [CodingCat] remove failure-detector related info
    8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils

commit 651a1c019eb911005e234a46cc559d63da352377
Author: Jacky Li <[email protected]>
Date:   2015-02-23T16:47:28Z

    [SPARK-5939][MLLib] make FPGrowth example app take parameters
    
    Add parameter parsing in FPGrowth example app in Scala and Java
    And a sample data file is added in data/mllib folder
    
    Author: Jacky Li <[email protected]>
    
    Closes #4714 from jackylk/parameter and squashes the following commits:
    
    8c478b3 [Jacky Li] fix according to comments
    3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
    f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters

commit 28ccf5ee769a1df019e38985112065c01724fbd9
Author: Alexander Ulanov <[email protected]>
Date:   2015-02-23T20:09:40Z

    [MLLIB] SPARK-5912 Programming guide for feature selection
    
    Added description of ChiSqSelector and few words about feature selection in 
general. I could add a code example, however it would not look reasonable in 
the absence of feature discretizer or a dataset in the `data` folder that has 
redundant features.
    
    Author: Alexander Ulanov <[email protected]>
    
    Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
    
    19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
    58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
    eb6b9fe [Alexander Ulanov] Typo
    2921a1d [Alexander Ulanov] ChiSqSelector example of use
    c845350 [Alexander Ulanov] ChiSqSelector docs

commit 59536cc87e10e5011560556729dd901280958f43
Author: Joseph K. Bradley <[email protected]>
Date:   2015-02-24T00:15:57Z

    [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs
    
    Fixes:
    * typo in Scala example
    * Removed comment "usually applied on sparse data" since that is debatable
    * small edits to text for clarity
    
    CC: avulanov  I noticed a typo post-hoc and ended up making a few small 
edits.  Do the changes look OK?
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #4732 from jkbradley/chisqselector-docs and squashes the following 
commits:
    
    9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
    3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs

commit 48376bfe9c97bf31279918def6c6615849c88f4d
Author: Yin Huai <[email protected]>
Date:   2015-02-24T01:16:34Z

    [SPARK-5935][SQL] Accept MapType in the schema provided to a JSON dataset.
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-5935
    
    Author: Yin Huai <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #4710 from yhuai/jsonMapType and squashes the following commits:
    
    3e40390 [Yin Huai] Remove unnecessary changes.
    f8e6267 [Yin Huai] Fix test.
    baa36e3 [Yin Huai] Accept MapType in the schema provided to 
jsonFile/jsonRDD.

commit 1ed57086d402c38d95cda6c3d9d7aea806609bf9
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T01:34:54Z

    [SPARK-5873][SQL] Allow viewing of partially analyzed plans in 
queryExecution
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4684 from marmbrus/explainAnalysis and squashes the following 
commits:
    
    afbaa19 [Michael Armbrust] fix python
    d93278c [Michael Armbrust] fix hive
    e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into explainAnalysis
    52119f2 [Michael Armbrust] more tests
    82a5431 [Michael Armbrust] fix tests
    25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into explainAnalysis
    aee1e6a [Michael Armbrust] fix hive
    b23a844 [Michael Armbrust] newline
    de8dc51 [Michael Armbrust] more comments
    acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans 
in query execution

commit cf2e41653de778dc8db8b03385a053aae1152e19
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-24T06:08:44Z

    [SPARK-5958][MLLIB][DOC] update block matrix user guide
    
    * Removed SVD code from examples.
    * Corrected Java API doc link.
    * Updated variable names: `AtransposeA` -> `ata`.
    * Minor changes.
    
    brkyvz
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the 
following commits:
    
    70f53ac [Xiangrui Meng] update block matrix user guide

commit 840333133396d443e747f62fce9967f7681fb276
Author: Cheng Lian <[email protected]>
Date:   2015-02-24T18:45:38Z

    [SPARK-5968] [SQL] Suppresses ParquetOutputCommitter WARN logs
    
    Please refer to the [JIRA ticket] [1] for the motivation.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-5968
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png"; height=40 alt="Review 
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #4744 from liancheng/spark-5968 and squashes the following commits:
    
    caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs

commit 0a59e45e2f2e6f00ccd5f10c79f629fb796fd8d0
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T18:49:51Z

    [SPARK-5910][SQL] Support for as in selectExpr
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4736 from marmbrus/asExprs and squashes the following commits:
    
    5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr

commit 201236628a344194f7c20ba8e9afeeaefbe9318c
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T18:52:18Z

    [SPARK-5532][SQL] Repartition should not use external rdd representation
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4738 from marmbrus/udtRepart and squashes the following commits:
    
    c06d7b5 [Michael Armbrust] fix compilation
    91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use 
external rdd representation

commit 64d2c01ff1048de83b9b8efce987b55e457298f9
Author: Tathagata Das <[email protected]>
Date:   2015-02-24T19:02:47Z

    [Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds
    
    Patch should be self-explanatory
    pwendell JoshRosen
    
    Author: Tathagata Das <[email protected]>
    
    Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
    
    653b5bb [Tathagata Das] Fixed the fix and added test
    e2de972 [Tathagata Das] Clear stages which have no corresponding active 
jobs.

commit 6d2caa576fcdc5c848d1472b09c685b3871e220e
Author: Andrew Or <[email protected]>
Date:   2015-02-24T19:08:07Z

    [SPARK-5965] Standalone Worker UI displays {{USER_JAR}}
    
    For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
    This was caused by 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
    
    Author: Andrew Or <[email protected]>
    
    Closes #4739 from andrewor14/user-jar-blocker and squashes the following 
commits:
    
    23c4a9e [Andrew Or] Use right argument

commit 105791e35cee694f3b2ac1e06758650fe44e2c71
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-24T19:38:59Z

    [MLLIB] Change x_i to y_i in Variance's user guide
    
    Variance is calculated on labels/responses.
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4740 from mengxr/patch-1 and squashes the following commits:
    
    673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide

commit c5ba975ee85521f708ebeec81144347cf1b40fba
Author: Judy <[email protected]>
Date:   2015-02-24T20:50:16Z

    [Spark-5708] Add Slf4jSink to Spark Metrics
    
    Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
    This sends metrics to log4j, allowing spark users to reuse log4j pipeline 
for metrics collection.
    
    Reviewed existing unit tests and didn't see any sink-related tests. Please 
advise on if tests should be added.
    
    Author: Judy <[email protected]>
    Author: judynash <[email protected]>
    
    Closes #4644 from judynash/master and squashes the following commits:
    
    57ef214 [judynash] doc clarification and indent fixes
    a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics

commit a2b9137923e0ba328da8fff2fbbfcf2abf50b033
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T21:39:29Z

    [SPARK-5952][SQL] Lock when using hive metastore client
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4746 from marmbrus/hiveLock and squashes the following commits:
    
    8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore 
client

commit da505e59274d1c838653c1109db65ad374e65304
Author: Davies Liu <[email protected]>
Date:   2015-02-24T22:50:00Z

    [SPARK-5973] [PySpark] fix zip with two RDDs with AutoBatchedSerializer
    
    Author: Davies Liu <[email protected]>
    
    Closes #4745 from davies/fix_zip and squashes the following commits:
    
    2124b2c [Davies Liu] Update tests.py
    b5c828f [Davies Liu] increase the number of records
    c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer

commit 2a0fe34891882e0fde1b5722d8227aa99acc0f1f
Author: MechCoder <[email protected]>
Date:   2015-02-24T23:13:22Z

    [SPARK-5436] [MLlib] Validate GradientBoostedTrees using runWithValidation
    
    One can early stop if the decrease in error rate is lesser than a certain 
tol or if the error increases if the training data is overfit.
    
    This introduces a new method runWithValidation which takes in a pair of 
RDD's , one for the training data and the other for the validation.
    
    Author: MechCoder <[email protected]>
    
    Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
    
    1bb21d4 [MechCoder] Combine regression and classification tests into a 
single one
    e4d799b [MechCoder] Addresses indentation and doc comments
    b48a70f [MechCoder] COSMIT
    b928a19 [MechCoder] Move validation while training section under usage tips
    fad9b6e [MechCoder] Made the following changes 1. Add section to 
documentation 2. Return corresponding to bestValidationError 3. Allow negative 
tolerance.
    55e5c3b [MechCoder] One liner for prevValidateError
    3e74372 [MechCoder] TST: Add test for classification
    77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using 
runWithValidation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Aditional information on build from source

Reply via email to