GitHub user igorcosta opened a pull request:
https://github.com/apache/spark/pull/5091
Aditional information on build from source
There's a substantial missing info from get's started. So adding more
options to build from it source code.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/igorcosta/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5091.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5091
----
commit 4a17eedb16343413e5b6f8bb58c6da8952ee7ab6
Author: Joseph K. Bradley <[email protected]>
Date: 2015-02-20T10:31:32Z
[SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL
DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.
For SPARK-5892:
* Fix Python docs
* Various other cleanups
BTW, I accidentally merged this with master. If you want to compile it on
your own, use this branch which is based on spark/branch-1.3 and cherry-picks
the commits from this PR:
[https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
CC: mengxr (ML), davies (Python docs)
Author: Joseph K. Bradley <[email protected]>
Closes #4675 from jkbradley/doc-review-1.3 and squashes the following
commits:
f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.
Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master'
into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page
for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for
string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class
hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
commit 5b0a42cb17b840c82d3f8a5ad061d99e261ceadf
Author: Davies Liu <[email protected]>
Date: 2015-02-20T23:35:05Z
[SPARK-5898] [SPARK-5896] [SQL] [PySpark] create DataFrame from pandas and
tuple/list
Fix createDataFrame() from pandas DataFrame (not tested by jenkins, depends
on SPARK-5693).
It also support to create DataFrame from plain tuple/list without column
names, `_1`, `_2` will be used as column names.
Author: Davies Liu <[email protected]>
Closes #4679 from davies/pandas and squashes the following commits:
c0cbe0b [Davies Liu] fix tests
8466d1d [Davies Liu] fix create DataFrame from pandas
commit e155324711740da97698b93526128b0eae2dc0ce
Author: Jacky Li <[email protected]>
Date: 2015-02-21T13:00:16Z
[MLlib] fix typo
fix typo: it should be "default:" instead of "default;"
Author: Jacky Li <[email protected]>
Closes #4713 from jackylk/patch-10 and squashes the following commits:
15daf2e [Jacky Li] [MLlib] fix typo
commit d3cbd38c33e6a2addcf8caa18eeb10036fbfd01b
Author: Nishkam Ravi <[email protected]>
Date: 2015-02-21T17:59:28Z
SPARK-5841 [CORE] [HOTFIX 2] Memory leak in DiskBlockManager
Continue to see IllegalStateException in YARN cluster mode. Adding a simple
workaround for now.
Author: Nishkam Ravi <[email protected]>
Author: nishkamravi2 <[email protected]>
Author: nravi <[email protected]>
Closes #4690 from nishkamravi2/master_nravi and squashes the following
commits:
d453197 [nishkamravi2] Update NewHadoopRDD.scala
6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
0ce2c32 [nishkamravi2] Update HadoopRDD.scala
f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of
https://github.com/nishkamravi2/spark into master_nravi
ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of
removeShutDownHook. Deletion of semi-redundant occurrences of expensive
operation inShutDown.
71d0e17 [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
494d8c0 [nishkamravi2] Update DiskBlockManager.scala
3c5ddba [nishkamravi2] Update DiskBlockManager.scala
f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by
recent changes to BlockManager.stop
79ea8b4 [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
b446edc [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
535295a [nishkamravi2] Update TaskSetManager.scala
3e1b616 [Nishkam Ravi] Modify test for maxResultSize
9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message
and add condition to check if maxResultSize > 0)
5f8f9ed [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
636a9ff [nishkamravi2] Update YarnAllocator.scala
8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
5ac2ec1 [Nishkam Ravi] Remove out
dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead
issue
42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
c726bd9 [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
1cf2d1e [nishkamravi2] Update YarnAllocator.scala
ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an
additive constant to a multiplier (redone to resolve merge conflicts)
2e69f11 [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark into master_nravi
efd688a [Nishkam Ravi] Merge branch 'master' of
https://github.com/apache/spark
2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int
value, to be consistent with rest of Spark
3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097,
Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue
(SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test
org.apache.spark.JavaAPISuite.wholeTextFiles
commit 7138816abe1060a1e967c4c77c72d5752586d557
Author: Hari Shreedharan <[email protected]>
Date: 2015-02-21T18:01:01Z
[SPARK-5937][YARN] Fix ClientSuite to set YARN mode, so that the correct
class is used in t...
...ests.
Without this SparkHadoopUtil is used by the Client instead of
YarnSparkHadoopUtil.
Author: Hari Shreedharan <[email protected]>
Closes #4711 from harishreedharan/SPARK-5937 and squashes the following
commits:
d154de6 [Hari Shreedharan] Use System.clearProperty() instead of setting
the value of SPARK_YARN_MODE to empty string.
f729f70 [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the
correct class is used in tests.
commit 7683982faf920b8ac6cf46b79842450e7d46c5cc
Author: Evan Yu <[email protected]>
Date: 2015-02-21T20:40:21Z
[SPARK-5860][CORE] JdbcRDD: overflow on large range with high number of
partitions
Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT
ids
Author: Evan Yu <[email protected]>
Closes #4701 from hotou/SPARK-5860 and squashes the following commits:
9e038d1 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
7883ad9 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
c88755a [Evan Yu] [SPARK-5860][CORE] switch to BigInt instead of BigDecimal
4e9ff4f [Evan Yu] [SPARK-5860][CORE] JdbcRDD overflow on large range with
high number of partitions
commit 46462ff255b0eef8263ed798f3d5aeb8460ecaf1
Author: Patrick Wendell <[email protected]>
Date: 2015-02-22T07:07:30Z
MAINTENANCE: Automated closing of pull requests.
This commit exists to close the following pull requests on Github:
Closes #3490 (close requested by 'andrewor14')
Closes #4646 (close requested by 'srowen')
Closes #3591 (close requested by 'andrewor14')
Closes #3656 (close requested by 'andrewor14')
Closes #4553 (close requested by 'JoshRosen')
Closes #4202 (close requested by 'srowen')
Closes #4497 (close requested by 'marmbrus')
Closes #4150 (close requested by 'andrewor14')
Closes #2409 (close requested by 'andrewor14')
Closes #4221 (close requested by 'srowen')
commit a7f90390251ff62a0e10edf4c2eb876538597791
Author: Alexander <[email protected]>
Date: 2015-02-22T08:53:05Z
[DOCS] Fix typo in API for custom InputFormats based on the ânewâ
MapReduce API
This looks like a simple typo ```SparkContext.newHadoopRDD``` instead of
```SparkContext.newAPIHadoopRDD``` as in actual
http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.SparkContext
Author: Alexander <[email protected]>
Closes #4718 from bzz/hadoop-InputFormats-doc-fix and squashes the
following commits:
680a4c4 [Alexander] Fix typo in docs on custom Hadoop InputFormats
commit 275b1bef897d775f1f7743378ca3e09e36160136
Author: Cheng Hao <[email protected]>
Date: 2015-02-22T08:56:30Z
[DataFrame] [Typo] Fix the typo
Author: Cheng Hao <[email protected]>
Closes #4717 from chenghao-intel/typo1 and squashes the following commits:
858d7b0 [Cheng Hao] update the typo
commit e4f9d03d728bc6fbfb6ebc7d15b4ba328f98f3dc
Author: Aaron Josephs <[email protected]>
Date: 2015-02-23T06:09:06Z
[SPARK-911] allow efficient queries for a range if RDD is partitioned wi...
...th RangePartitioner
Author: Aaron Josephs <[email protected]>
Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits:
e30ade5 [Aaron Josephs] [SPARK-911] allow efficient queries for a range if
RDD is partitioned with RangePartitioner
commit 95cd643aa954b7e4229e94fa8bdc99bf3b2bb1da
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-23T06:43:04Z
[SPARK-3885] Provide mechanism to remove accumulators once they are no
longer used
Instead of storing a strong reference to accumulators, I've replaced this
with a weak reference and updated any code that uses these accumulators to
check whether the reference resolves before using the accumulator. A weak
reference will be cleared when there is no longer an existing copy of the
variable versus using a soft reference in which case accumulators would only be
cleared when the GC explicitly ran out of memory.
Author: Ilya Ganelin <[email protected]>
Closes #4021 from ilganeli/SPARK-3885 and squashes the following commits:
4ba9575 [Ilya Ganelin] Fixed error in test suite
8510943 [Ilya Ganelin] Extra code
bb76ef0 [Ilya Ganelin] File deleted somehow
283a333 [Ilya Ganelin] Added cleanup method for accumulators to remove
stale references within Accumulators.original to accumulators that are now out
of scope
345fd4f [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into
SPARK-3885
7485a82 [Ilya Ganelin] Fixed build error
c8e0f2b [Ilya Ganelin] Added working test for accumulator garbage collection
94ce754 [Ilya Ganelin] Still not being properly garbage collected
8722b63 [Ilya Ganelin] Fixing gc test
7414a9c [Ilya Ganelin] Added test for accumulator garbage collection
18d62ec [Ilya Ganelin] Updated to throw Exception when accessing a GCd
accumulator
9a81928 [Ilya Ganelin] Reverting permissions changes
28f705c [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into
SPARK-3885
b820ab4b [Ilya Ganelin] reset
d78f4bf [Ilya Ganelin] Removed obsolete comment
0746e61 [Ilya Ganelin] Updated DAGSchedulerSUite to fix bug
3350852 [Ilya Ganelin] Updated DAGScheduler and Suite to correctly use new
implementation of WeakRef Accumulator storage
c49066a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into
SPARK-3885
cbb9023 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into
SPARK-3885
a77d11b [Ilya Ganelin] Updated Accumulators class to store weak references
instead of strong references to allow garbage collection of old accumulators
commit 934876741683fc254fed18e7ff630614f78944be
Author: Makoto Fukuhara <[email protected]>
Date: 2015-02-23T09:24:33Z
[EXAMPLES] fix typo.
Author: Makoto Fukuhara <[email protected]>
Closes #4724 from fukuo33/fix-typo and squashes the following commits:
8c806b9 [Makoto Fukuhara] fix typo.
commit 757b14b862a1d39c1bad7b321dae1a3ea8338fbb
Author: Saisai Shao <[email protected]>
Date: 2015-02-23T11:27:27Z
[SPARK-5943][Streaming] Update the test to use new API to reduce the warning
Author: Saisai Shao <[email protected]>
Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
1b01233 [Saisai Shao] Update the test to use new API to reduce the warning
commit 242d49584c6aa21d928db2552033661950f760a5
Author: CodingCat <[email protected]>
Date: 2015-02-23T11:29:25Z
[SPARK-5724] fix the misconfiguration in AkkaUtils
https://issues.apache.org/jira/browse/SPARK-5724
In AkkaUtil, we set several failure detector related the parameters as
following
```
al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
.withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
s"""
|akka.daemonic = on
|akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
|akka.stdout-loglevel = "ERROR"
|akka.jvm-exit-on-fatal-error = off
|akka.remote.require-cookie = "$requireCookie"
|akka.remote.secure-cookie = "$secureCookie"
|akka.remote.transport-failure-detector.heartbeat-interval =
$akkaHeartBeatInterval s
|akka.remote.transport-failure-detector.acceptable-heartbeat-pause =
$akkaHeartBeatPauses s
|akka.remote.transport-failure-detector.threshold =
$akkaFailureDetector
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.transport-class =
"akka.remote.transport.netty.NettyTransport"
|akka.remote.netty.tcp.hostname = "$host"
|akka.remote.netty.tcp.port = $port
|akka.remote.netty.tcp.tcp-nodelay = on
|akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
|akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
|akka.remote.netty.tcp.execution-pool-size = $akkaThreads
|akka.actor.default-dispatcher.throughput = $akkaBatchSize
|akka.log-config-on-start = $logAkkaConfig
|akka.remote.log-remote-lifecycle-events = $lifecycleEvents
|akka.log-dead-letters = $lifecycleEvents
|akka.log-dead-letters-during-shutdown = $lifecycleEvents
""".stripMargin))
```
Actually, we do not have any parameter naming
"akka.remote.transport-failure-detector.threshold"
see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
what we have is "akka.remote.watch-failure-detector.threshold"
Author: CodingCat <[email protected]>
Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
bafe56e [CodingCat] fix the grammar in configuration doc
338296e [CodingCat] remove failure-detector related info
8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
commit 651a1c019eb911005e234a46cc559d63da352377
Author: Jacky Li <[email protected]>
Date: 2015-02-23T16:47:28Z
[SPARK-5939][MLLib] make FPGrowth example app take parameters
Add parameter parsing in FPGrowth example app in Scala and Java
And a sample data file is added in data/mllib folder
Author: Jacky Li <[email protected]>
Closes #4714 from jackylk/parameter and squashes the following commits:
8c478b3 [Jacky Li] fix according to comments
3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
commit 28ccf5ee769a1df019e38985112065c01724fbd9
Author: Alexander Ulanov <[email protected]>
Date: 2015-02-23T20:09:40Z
[MLLIB] SPARK-5912 Programming guide for feature selection
Added description of ChiSqSelector and few words about feature selection in
general. I could add a code example, however it would not look reasonable in
the absence of feature discretizer or a dataset in the `data` folder that has
redundant features.
Author: Alexander Ulanov <[email protected]>
Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
eb6b9fe [Alexander Ulanov] Typo
2921a1d [Alexander Ulanov] ChiSqSelector example of use
c845350 [Alexander Ulanov] ChiSqSelector docs
commit 59536cc87e10e5011560556729dd901280958f43
Author: Joseph K. Bradley <[email protected]>
Date: 2015-02-24T00:15:57Z
[SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs
Fixes:
* typo in Scala example
* Removed comment "usually applied on sparse data" since that is debatable
* small edits to text for clarity
CC: avulanov I noticed a typo post-hoc and ended up making a few small
edits. Do the changes look OK?
Author: Joseph K. Bradley <[email protected]>
Closes #4732 from jkbradley/chisqselector-docs and squashes the following
commits:
9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
commit 48376bfe9c97bf31279918def6c6615849c88f4d
Author: Yin Huai <[email protected]>
Date: 2015-02-24T01:16:34Z
[SPARK-5935][SQL] Accept MapType in the schema provided to a JSON dataset.
JIRA: https://issues.apache.org/jira/browse/SPARK-5935
Author: Yin Huai <[email protected]>
Author: Yin Huai <[email protected]>
Closes #4710 from yhuai/jsonMapType and squashes the following commits:
3e40390 [Yin Huai] Remove unnecessary changes.
f8e6267 [Yin Huai] Fix test.
baa36e3 [Yin Huai] Accept MapType in the schema provided to
jsonFile/jsonRDD.
commit 1ed57086d402c38d95cda6c3d9d7aea806609bf9
Author: Michael Armbrust <[email protected]>
Date: 2015-02-24T01:34:54Z
[SPARK-5873][SQL] Allow viewing of partially analyzed plans in
queryExecution
Author: Michael Armbrust <[email protected]>
Closes #4684 from marmbrus/explainAnalysis and squashes the following
commits:
afbaa19 [Michael Armbrust] fix python
d93278c [Michael Armbrust] fix hive
e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master'
into explainAnalysis
52119f2 [Michael Armbrust] more tests
82a5431 [Michael Armbrust] fix tests
25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master'
into explainAnalysis
aee1e6a [Michael Armbrust] fix hive
b23a844 [Michael Armbrust] newline
de8dc51 [Michael Armbrust] more comments
acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans
in query execution
commit cf2e41653de778dc8db8b03385a053aae1152e19
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-24T06:08:44Z
[SPARK-5958][MLLIB][DOC] update block matrix user guide
* Removed SVD code from examples.
* Corrected Java API doc link.
* Updated variable names: `AtransposeA` -> `ata`.
* Minor changes.
brkyvz
Author: Xiangrui Meng <[email protected]>
Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the
following commits:
70f53ac [Xiangrui Meng] update block matrix user guide
commit 840333133396d443e747f62fce9967f7681fb276
Author: Cheng Lian <[email protected]>
Date: 2015-02-24T18:45:38Z
[SPARK-5968] [SQL] Suppresses ParquetOutputCommitter WARN logs
Please refer to the [JIRA ticket] [1] for the motivation.
[1]: https://issues.apache.org/jira/browse/SPARK-5968
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
<!-- Reviewable:end -->
Author: Cheng Lian <[email protected]>
Closes #4744 from liancheng/spark-5968 and squashes the following commits:
caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs
commit 0a59e45e2f2e6f00ccd5f10c79f629fb796fd8d0
Author: Michael Armbrust <[email protected]>
Date: 2015-02-24T18:49:51Z
[SPARK-5910][SQL] Support for as in selectExpr
Author: Michael Armbrust <[email protected]>
Closes #4736 from marmbrus/asExprs and squashes the following commits:
5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
commit 201236628a344194f7c20ba8e9afeeaefbe9318c
Author: Michael Armbrust <[email protected]>
Date: 2015-02-24T18:52:18Z
[SPARK-5532][SQL] Repartition should not use external rdd representation
Author: Michael Armbrust <[email protected]>
Closes #4738 from marmbrus/udtRepart and squashes the following commits:
c06d7b5 [Michael Armbrust] fix compilation
91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use
external rdd representation
commit 64d2c01ff1048de83b9b8efce987b55e457298f9
Author: Tathagata Das <[email protected]>
Date: 2015-02-24T19:02:47Z
[Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds
Patch should be self-explanatory
pwendell JoshRosen
Author: Tathagata Das <[email protected]>
Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
653b5bb [Tathagata Das] Fixed the fix and added test
e2de972 [Tathagata Das] Clear stages which have no corresponding active
jobs.
commit 6d2caa576fcdc5c848d1472b09c685b3871e220e
Author: Andrew Or <[email protected]>
Date: 2015-02-24T19:08:07Z
[SPARK-5965] Standalone Worker UI displays {{USER_JAR}}
For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
This was caused by 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
Author: Andrew Or <[email protected]>
Closes #4739 from andrewor14/user-jar-blocker and squashes the following
commits:
23c4a9e [Andrew Or] Use right argument
commit 105791e35cee694f3b2ac1e06758650fe44e2c71
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-24T19:38:59Z
[MLLIB] Change x_i to y_i in Variance's user guide
Variance is calculated on labels/responses.
Author: Xiangrui Meng <[email protected]>
Closes #4740 from mengxr/patch-1 and squashes the following commits:
673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
commit c5ba975ee85521f708ebeec81144347cf1b40fba
Author: Judy <[email protected]>
Date: 2015-02-24T20:50:16Z
[Spark-5708] Add Slf4jSink to Spark Metrics
Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
This sends metrics to log4j, allowing spark users to reuse log4j pipeline
for metrics collection.
Reviewed existing unit tests and didn't see any sink-related tests. Please
advise on if tests should be added.
Author: Judy <[email protected]>
Author: judynash <[email protected]>
Closes #4644 from judynash/master and squashes the following commits:
57ef214 [judynash] doc clarification and indent fixes
a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
commit a2b9137923e0ba328da8fff2fbbfcf2abf50b033
Author: Michael Armbrust <[email protected]>
Date: 2015-02-24T21:39:29Z
[SPARK-5952][SQL] Lock when using hive metastore client
Author: Michael Armbrust <[email protected]>
Closes #4746 from marmbrus/hiveLock and squashes the following commits:
8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore
client
commit da505e59274d1c838653c1109db65ad374e65304
Author: Davies Liu <[email protected]>
Date: 2015-02-24T22:50:00Z
[SPARK-5973] [PySpark] fix zip with two RDDs with AutoBatchedSerializer
Author: Davies Liu <[email protected]>
Closes #4745 from davies/fix_zip and squashes the following commits:
2124b2c [Davies Liu] Update tests.py
b5c828f [Davies Liu] increase the number of records
c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer
commit 2a0fe34891882e0fde1b5722d8227aa99acc0f1f
Author: MechCoder <[email protected]>
Date: 2015-02-24T23:13:22Z
[SPARK-5436] [MLlib] Validate GradientBoostedTrees using runWithValidation
One can early stop if the decrease in error rate is lesser than a certain
tol or if the error increases if the training data is overfit.
This introduces a new method runWithValidation which takes in a pair of
RDD's , one for the training data and the other for the validation.
Author: MechCoder <[email protected]>
Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
1bb21d4 [MechCoder] Combine regression and classification tests into a
single one
e4d799b [MechCoder] Addresses indentation and doc comments
b48a70f [MechCoder] COSMIT
b928a19 [MechCoder] Move validation while training section under usage tips
fad9b6e [MechCoder] Made the following changes 1. Add section to
documentation 2. Return corresponding to bestValidationError 3. Allow negative
tolerance.
55e5c3b [MechCoder] One liner for prevValidateError
3e74372 [MechCoder] TST: Add test for classification
77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using
runWithValidation
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]