GitHub user litao-buptsse opened a pull request:
https://github.com/apache/spark/pull/7041
[YARN] SPARK-8657: Fail to upload conf archive to viewfs in spark-1.4
Fail to upload conf archive to viewfs in spark-1.4
JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/litao-buptsse/spark SPARK-8657
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7041.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7041
----
commit 0a65224aed9d2bb780e0d3e70d2a7ba34f30219b
Author: Mike Dusenberry <[email protected]>
Date: 2015-05-28T21:15:10Z
[DOCS] Fixing broken "IDE setup" link in the Building Spark documentation.
The location of the IDE setup information has changed, so this just updates
the link on the Building Spark page.
Author: Mike Dusenberry <[email protected]>
Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and
squashes the following commits:
75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building
Spark documentation by pointing to new location.
(cherry picked from commit 3e312a5ed0154527c66eeeee0d2cc3bfce0a820e)
Signed-off-by: Sean Owen <[email protected]>
commit b9bdf12a1c2ea81cfaae7df540670c34d028838d
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-28T23:32:51Z
[SPARK-7198] [MLLIB] VectorAssembler should output ML attributes
`VectorAssembler` should carry over ML attributes. For unknown attributes,
we assume numeric values. This PR handles the following cases:
1. DoubleType with ML attribute: carry over
2. DoubleType without ML attribute: numeric value
3. Scalar type: numeric value
4. VectorType with all ML attributes: carry over and update names
5. VectorType with number of ML attributes: assume all numeric
6. VectorType without ML attributes: check the first row and get the number
of attributes
jkbradley
Author: Xiangrui Meng <[email protected]>
Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:
a9d2469 [Xiangrui Meng] add space
facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes
(cherry picked from commit 7859ab659eecbcf2d8b9a274a4e9e4f5186a528c)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 9c2c6b4a676ea1fdfecd9cd450d43d4081c77385
Author: Reynold Xin <[email protected]>
Date: 2015-05-28T23:56:59Z
Remove SizeEstimator from o.a.spark package.
See comments on https://github.com/apache/spark/pull/3913
Author: Reynold Xin <[email protected]>
Closes #6471 from rxin/sizeestimator and squashes the following commits:
c057095 [Reynold Xin] Fixed import.
2da478b [Reynold Xin] Remove SizeEstimator from o.a.spark package.
(cherry picked from commit 0077af22ca5fcb2e50dcf7daa4f6804ae722bfbe)
Signed-off-by: Reynold Xin <[email protected]>
commit 8f4a86eaa1cad9a2a7607fd5446105c93e5e424e
Author: Yin Huai <[email protected]>
Date: 2015-05-29T00:12:30Z
[SPARK-7853] [SQL] Fix HiveContext in Spark Shell
https://issues.apache.org/jira/browse/SPARK-7853
This fixes the problem introduced by my change in
https://github.com/apache/spark/pull/6435, which causes that Hive Context fails
to create in spark shell because of the class loader issue.
Author: Yin Huai <[email protected]>
Closes #6459 from yhuai/SPARK-7853 and squashes the following commits:
37ad33e [Yin Huai] Do not use hiveQlTable at all.
47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf.
005649b [Yin Huai] Update comment.
35d86f3 [Yin Huai] Access TTable directly to make sure Hive will not
internally use any metastore utility functions.
3737766 [Yin Huai] Recursively find all jars.
(cherry picked from commit 572b62cafe4bc7b1d464c9dcfb449c9d53456826)
Signed-off-by: Yin Huai <[email protected]>
commit 7bb445a38ca37e72d0b11ad1c4448632b679eda6
Author: Xusen Yin <[email protected]>
Date: 2015-05-29T00:30:12Z
[SPARK-7577] [ML] [DOC] add bucketizer doc
CC jkbradley
Author: Xusen Yin <[email protected]>
Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:
e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer
(cherry picked from commit 1bd63e82fdb6ee57c61051430d63685b801df016)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit f4b135337c5032dcd224ebd14e134aa8de0c1667
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T00:55:22Z
[SPARK-7927] whitespace fixes for streaming.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6475 from rxin/whitespace-streaming and squashes the following
commits:
810dae4 [Reynold Xin] Fixed tests.
89068ad [Reynold Xin] [SPARK-7927] whitespace fixes for streaming.
(cherry picked from commit 3af0b3136e4b7dea52c413d640653ccddc638574)
Signed-off-by: Reynold Xin <[email protected]>
commit 3b38c06f0d19bd0d15df768d6ae0037f6c04b88d
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T01:08:56Z
[SPARK-7927] whitespace fixes for Hive and ThriftServer.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6478 from rxin/whitespace-hive and squashes the following commits:
e01b0e0 [Reynold Xin] Fixed tests.
a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and
ThriftServer.
(cherry picked from commit ee6a0e12fb76e4d5c24175900e5bf6a8cb35e2b0)
Signed-off-by: Reynold Xin <[email protected]>
commit 3479e6a127d0b93ef38533fdad02a49850716583
Author: Kay Ousterhout <[email protected]>
Date: 2015-05-29T02:04:32Z
[SPARK-7933] Remove Patrick's username/pw from merge script
Looks like this was added by accident when pwendell merged a commit back in
September: fe2b1d6a209db9fe96b1c6630677955b94bd48c9
Author: Kay Ousterhout <[email protected]>
Closes #6485 from kayousterhout/SPARK-7933 and squashes the following
commits:
7c6164a [Kay Ousterhout] [SPARK-7933] Remove Patrick's username/pw from
merge script
(cherry picked from commit 66c49ed60dcef48a6b38ae2d2c4c479933f3aa19)
Signed-off-by: Patrick Wendell <[email protected]>
commit 0c05115063df39e6058c9c8ea90dd10724a7366d
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-29T03:09:12Z
[SPARK-7927] [MLLIB] Enforce whitespace for more tokens in style checker
rxin
Author: Xiangrui Meng <[email protected]>
Closes #6481 from mengxr/mllib-scalastyle and squashes the following
commits:
3ca4d61 [Xiangrui Meng] revert scalastyle config
30961ba [Xiangrui Meng] adjust spaces in mllib/test
571b5c5 [Xiangrui Meng] fix spaces in mllib
(cherry picked from commit 04616b1a2f5244710b07ecbb404384ded893292c)
Signed-off-by: Reynold Xin <[email protected]>
commit 9b97e95e86f0d11e8ae3ba55432c726cec79d5bc
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T03:10:21Z
[SPARK-7927] whitespace fixes for SQL core.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6477 from rxin/whitespace-sql-core and squashes the following
commits:
ce6e369 [Reynold Xin] Fixed tests.
6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core.
(cherry picked from commit ff44c711abc7ca545dfa1e836279c00fe7539c18)
Signed-off-by: Reynold Xin <[email protected]>
commit 142ae52d4800fdb966b14b8f0753ba7567c55204
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T03:11:04Z
[SPARK-7929] Remove Bagel examples & whitespace fix for examples.
Author: Reynold Xin <[email protected]>
Closes #6480 from rxin/whitespace-example and squashes the following
commits:
8a4a3d4 [Reynold Xin] [SPARK-7929] Remove Bagel examples & whitespace fix
for examples.
(cherry picked from commit 2881d14cbedc14f1cd8ae5078446dba1a8d39086)
Signed-off-by: Reynold Xin <[email protected]>
commit 22e42e3fee21fc1adcb4a4fb515197be6e1a36b0
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T03:11:57Z
[SPARK-7927] whitespace fixes for Catalyst module.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6476 from rxin/whitespace-catalyst and squashes the following
commits:
650409d [Reynold Xin] Fixed tests.
51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module.
(cherry picked from commit 8da560d7de9b3c9a3e3ff197eeb10a3d7023f10d)
Signed-off-by: Reynold Xin <[email protected]>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
commit e3dd2802f6dd8b2df9fb73d8e9901c4e6e4d6b84
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T03:15:52Z
[SPARK-7927] whitespace fixes for core.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6473 from rxin/whitespace-core and squashes the following commits:
058195d [Reynold Xin] Fixed tests.
fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core.
(cherry picked from commit 7f7505d8db7759ea46e904f767c23130eff1104a)
Signed-off-by: Reynold Xin <[email protected]>
commit b3a590061da09674cb0ff868c808985ea846145e
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T03:17:16Z
[SPARK-7927] whitespace fixes for GraphX.
So we can enable a whitespace enforcement rule in the style checker to save
code review time.
Author: Reynold Xin <[email protected]>
Closes #6474 from rxin/whitespace-graphx and squashes the following commits:
4d3cd26 [Reynold Xin] Fixed tests.
869dde4 [Reynold Xin] [SPARK-7927] whitespace fixes for GraphX.
(cherry picked from commit b069ad23d9b6cbfb3a8bf245547add4816669075)
Signed-off-by: Reynold Xin <[email protected]>
commit 6e99dd5d042e8a3e49937769a846bef8a66214f8
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-29T04:20:54Z
[SPARK-7926] [PYSPARK] use the official Pyrolite release
Switch to the official Pyrolite release from the one published under
`org.spark-project`. Thanks irmen for making the releases on Maven Central. We
didn't upgrade to 4.6 because we don't have enough time for QA. I excludes
`serpent` from its dependencies because we don't use it in Spark.
~~~
[info] +-net.jpountz.lz4:lz4:1.3.0
[info] +-net.razorvine:pyrolite:4.4
[info] +-net.sf.py4j:py4j:0.8.2.1
~~~
davies
Author: Xiangrui Meng <[email protected]>
Closes #6472 from mengxr/SPARK-7926 and squashes the following commits:
7b3c6bf [Xiangrui Meng] use the official Pyrolite release
(cherry picked from commit c45d58c143d68cb807186acc9d060daa8549dd5c)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 1d49d8c3fd297f7a6269693fbec623ddec96b279
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-29T04:26:43Z
[MINOR] fix RegressionEvaluator doc
`make clean html` under `python/doc` returns
~~~
/Users/meng/src/spark/python/pyspark/ml/evaluation.py:docstring of
pyspark.ml.evaluation.RegressionEvaluator.setParams:3: WARNING: Definition list
ends without a blank line; unexpected unindent.
~~~
harsha2010
Author: Xiangrui Meng <[email protected]>
Closes #6469 from mengxr/fix-regression-evaluator-doc and squashes the
following commits:
91e2dad [Xiangrui Meng] fix RegressionEvaluator doc
(cherry picked from commit 834e699524583a7ebfe9e83b3900ec503150deca)
Signed-off-by: Xiangrui Meng <[email protected]>
commit aee046dfa111b4323edd5f4ccb36075449492952
Author: Kay Ousterhout <[email protected]>
Date: 2015-05-29T05:09:49Z
[SPARK-7932] Fix misleading scheduler delay visualization
The existing code rounds down to the nearest percent when computing the
proportion
of a task's time that was spent on each phase of execution, and then
computes
the scheduler delay proportion as 100 - sum(all other proportions). As a
result,
a few extra percent can end up in the scheduler delay. This commit
eliminates
the rounding so that the time visualizations correspond properly to the
real times.
sarutak If you could take a look at this, that would be great! Not sure if
there's a good
reason to round here that I missed.
cc shivaram
Author: Kay Ousterhout <[email protected]>
Closes #6484 from kayousterhout/SPARK-7932 and squashes the following
commits:
1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay
visualization
(cherry picked from commit 04ddcd4db7801abefa9c9effe5d88413b29d713b)
Signed-off-by: Kay Ousterhout <[email protected]>
commit f7cb272b7c77de42681287925922d41248efca46
Author: Tathagata Das <[email protected]>
Date: 2015-05-29T05:28:13Z
[SPARK-7930] [CORE] [STREAMING] Fixed shutdown hook priorities
Shutdown hook for temp directories had priority 100 while SparkContext was
50. So the local root directory was deleted before SparkContext was shutdown.
This leads to scary errors on running jobs, at the time of shutdown. This is
especially a problem when running streaming examples, where Ctrl-C is the only
way to shutdown.
The fix in this PR is to make the temp directory shutdown priority lower
than SparkContext, so that the temp dirs are the last thing to get deleted,
after the SparkContext has been shut down. Also, the DiskBlockManager shutdown
priority is change from default 100 to temp_dir_prio + 1, so that it gets
invoked just before all temp dirs are cleared.
Author: Tathagata Das <[email protected]>
Closes #6482 from tdas/SPARK-7930 and squashes the following commits:
d7cbeb5 [Tathagata Das] Removed unnecessary line
1514d0b [Tathagata Das] Fixed shutdown hook priorities
(cherry picked from commit cd3d9a5c0c3e77098a72c85dffe4a27737009ae7)
Signed-off-by: Patrick Wendell <[email protected]>
commit 68559423ac2ffc2c9dfcbe95a8efa4868757c4bf
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-29T05:38:38Z
[SPARK-7922] [MLLIB] use DataFrames for user/item factors in ALSModel
Expose user/item factors in DataFrames. This is to be more consistent with
the pipeline API. It also helps maintain consistent APIs across languages. This
PR also removed fitting params from `ALSModel`.
coderxiang
Author: Xiangrui Meng <[email protected]>
Closes #6468 from mengxr/SPARK-7922 and squashes the following commits:
7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark
1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS
(cherry picked from commit db9513789756da4f16bb1fe8cf1d19500f231f54)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 7a52fdf25f8d635ba05796abb0c491454d7869cf
Author: Tathagata Das <[email protected]>
Date: 2015-05-29T05:39:21Z
[SPARK-7931] [STREAMING] Do not restart receiver when stopped
Attempts to restart the socket receiver when it is supposed to be stopped
causes undesirable error messages.
Author: Tathagata Das <[email protected]>
Closes #6483 from tdas/SPARK-7931 and squashes the following commits:
09aeee1 [Tathagata Das] Do not restart receiver when stopped
commit e419821c3b10e59e9765c6d41d80694772e5c772
Author: Patrick Wendell <[email protected]>
Date: 2015-05-29T05:48:02Z
[HOTFIX] Minor style fix from last commit
commit 2d97d7a0aa5740aacdb90ef646175770b7610c58
Author: Patrick Wendell <[email protected]>
Date: 2015-05-29T05:57:26Z
Preparing Spark release v1.4.0-rc3
commit 119c93af9c8c2888465eb2fa5977a074d33594ae
Author: Patrick Wendell <[email protected]>
Date: 2015-05-29T05:57:31Z
Preparing development version 1.4.0-SNAPSHOT
commit 55dc7a693368ddbd850459034709e3dd751dbcf3
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T06:00:02Z
[SPARK-7929] Turn whitespace checker on for more token types.
This is the last batch of changes to complete SPARK-7929.
Previous related PRs:
https://github.com/apache/spark/pull/6480
https://github.com/apache/spark/pull/6478
https://github.com/apache/spark/pull/6477
https://github.com/apache/spark/pull/6476
https://github.com/apache/spark/pull/6475
https://github.com/apache/spark/pull/6474
https://github.com/apache/spark/pull/6473
Author: Reynold Xin <[email protected]>
Closes #6487 from rxin/whitespace-lint and squashes the following commits:
b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more
token types.
(cherry picked from commit 97a60cf75d1fed654953eccedd04f3442389c5ca)
Signed-off-by: Reynold Xin <[email protected]>
commit f2796816bea12a7894519c6882b73f0ef5b99b14
Author: Patrick Wendell <[email protected]>
Date: 2015-05-29T06:40:22Z
Preparing Spark release v1.4.0-rc3
commit 6bf5a42084d5f5c601d3c41358a12bddeed6666b
Author: Patrick Wendell <[email protected]>
Date: 2015-05-29T06:40:27Z
Preparing development version 1.4.0-SNAPSHOT
commit 509a7cafccc7ce6a64a159a2647ed56e52ed5df9
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-29T07:51:12Z
[SPARK-7912] [SPARK-7921] [MLLIB] Update OneHotEncoder to handle ML
attributes and change includeFirst to dropLast
This PR contains two major changes to `OneHotEncoder`:
1. more robust handling of ML attributes. If the input attribute is
unknown, we look at the values to get the max category index
2. change `includeFirst` to `dropLast` and leave the default to `true`.
There are couple benefits:
a. consistent with other tutorials of one-hot encoding (or dummy
coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
b. keep the indices unmodified in the output vector. If we drop the
first, all indices will be shifted by 1.
c. If users use `StringIndex`, the last element is the least frequent
one.
Sorry for including two changes in one PR! I'll update the user guide in
another PR.
jkbradley sryza
Author: Xiangrui Meng <[email protected]>
Closes #6466 from mengxr/SPARK-7912 and squashes the following commits:
a280dca [Xiangrui Meng] fix tests
d8f234d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into
SPARK-7912
171b276 [Xiangrui Meng] mention the difference between our impl vs sklearn's
00dfd96 [Xiangrui Meng] update OneHotEncoder in Python
208ddad [Xiangrui Meng] update OneHotEncoder to handle ML attributes and
change includeFirst to dropLast
(cherry picked from commit 23452be944463dae72a35b58551040556dd3aeb5)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 459c3d22e0b520f0db21d471e29bdc6c4ec0029a
Author: Tim Ellison <[email protected]>
Date: 2015-05-29T09:14:43Z
[SPARK-7756] [CORE] Use testing cipher suites common to Oracle and IBM
security providers
Add alias names for supported cipher suites to the sample SSL configuration.
The IBM JSSE provider reports its cipher suite with an SSL_ prefix, but
accepts TLS_ prefixed suite names as an alias. However, Jetty filters the
requested ciphers based on the provider's reported supported suites, so the
TLS_ versions are never passed through to JSSE causing an SSL handshake failure.
Author: Tim Ellison <[email protected]>
Closes #6282 from tellison/SSLFailure and squashes the following commits:
8de8a3e [Tim Ellison] Update SecurityManagerSuite with new expected suite
names
96158b2 [Tim Ellison] Update the sample configs to use ciphers that are
common to both the Oracle and IBM security providers.
705421b [Tim Ellison] Merge branch 'master' of github.com:tellison/spark
into SSLFailure
68b9425 [Tim Ellison] Merge branch 'master' of
https://github.com/apache/spark into SSLFailure
b0c35f6 [Tim Ellison] [CORE] Add aliases used for cipher suites in IBM
provider
(cherry picked from commit bf46580708e41a1d48ac091adbca8d82a4008699)
Signed-off-by: Sean Owen <[email protected]>
commit 23bd05fff78ae4adbd7dd4f3edf4eea6ac63139d
Author: Reynold Xin <[email protected]>
Date: 2015-05-29T16:37:46Z
HOTFIX: Scala style checker failure due to a missing space in
TachyonBlockManager.scala.
commit caea7a618db7989a37ee59fcf928678efadba3e0
Author: Cheng Lian <[email protected]>
Date: 2015-05-29T17:43:34Z
[SPARK-7950] [SQL] Sets spark.sql.hive.version in
HiveThriftServer2.startWithContext()
When starting `HiveThriftServer2` via `startWithContext`, property
`spark.sql.hive.version` isn't set. This causes Simba ODBC driver 1.0.8.1006
behaves differently and fails simple queries.
Hive2 JDBC driver works fine in this case. Also, when starting the server
with `start-thriftserver.sh`, both Hive2 JDBC driver and Simba ODBC driver
works fine.
Please refer to [SPARK-7950] [1] for details.
[1]: https://issues.apache.org/jira/browse/SPARK-7950
Author: Cheng Lian <[email protected]>
Closes #6500 from liancheng/odbc-bugfix and squashes the following commits:
051e3a3 [Cheng Lian] Fixes import order
3a97376 [Cheng Lian] Sets spark.sql.hive.version in
HiveThriftServer2.startWithContext()
(cherry picked from commit e7b61775571ce7a06d044bc3a6055ff94c7477d6)
Signed-off-by: Yin Huai <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]