GitHub user witgo reopened a pull request:
https://github.com/apache/spark/pull/332
[SPARK-1470] remove scalalogging-slf4j dependency
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark remove_scalalogging
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/332.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #332
----
commit dd95abada78b4d0aec97dacda50fdfd74464b073
Author: Reynold Xin <[email protected]>
Date: 2014-07-15T08:46:57Z
[SPARK-2399] Add support for LZ4 compression.
Based on Greg Bowyer's patch from JIRA
https://issues.apache.org/jira/browse/SPARK-2399
Author: Reynold Xin <[email protected]>
Closes #1416 from rxin/lz4 and squashes the following commits:
6c8fefe [Reynold Xin] Fixed typo.
8a14d38 [Reynold Xin] [SPARK-2399] Add support for LZ4 compression.
commit 52beb20f7904e0333198b9b14619366ddf53ab85
Author: DB Tsai <[email protected]>
Date: 2014-07-15T09:14:58Z
[SPARK-2477][MLlib] Using appendBias for adding intercept in
GeneralizedLinearAlgorithm
Instead of using prependOne currently in GeneralizedLinearAlgorithm, we
would like to use appendBias for 1) keeping the indices of original training
set unchanged by adding the intercept into the last element of vector and 2)
using the same public API for consistently adding intercept.
Author: DB Tsai <[email protected]>
Closes #1410 from dbtsai/SPARK-2477_intercept_with_appendBias and squashes
the following commits:
011432c [DB Tsai] From Alpine Data Labs
commit 8f1d4226c285e33d2fb839d3163bb374eb6db0e7
Author: Reynold Xin <[email protected]>
Date: 2014-07-15T09:15:29Z
Update README.md to include a slightly more informative project description.
(cherry picked from commit 401083be9f010f95110a819a49837ecae7d9c4ec)
Signed-off-by: Reynold Xin <[email protected]>
commit 6555618c8f39b4e7da9402c3fd9da7a75bf7794e
Author: Reynold Xin <[email protected]>
Date: 2014-07-15T09:20:01Z
README update: added "for Big Data".
commit 04b01bb101eeaf76c2e7c94c291669f0b2372c9a
Author: Alexander Ulanov <[email protected]>
Date: 2014-07-15T15:40:22Z
[MLLIB] [SPARK-2222] Add multiclass evaluation metrics
Adding two classes:
1) MulticlassMetrics implements various multiclass evaluation metrics
2) MulticlassMetricsSuite implements unit tests for MulticlassMetrics
Author: Alexander Ulanov <[email protected]>
Author: unknown <[email protected]>
Author: Xiangrui Meng <[email protected]>
Closes #1155 from avulanov/master and squashes the following commits:
2eae80f [Alexander Ulanov] Merge pull request #1 from mengxr/avulanov-master
5ebeb08 [Xiangrui Meng] minor updates
79c3555 [Alexander Ulanov] Addressing reviewers comments mengxr
0fa9511 [Alexander Ulanov] Addressing reviewers comments mengxr
f0dadc9 [Alexander Ulanov] Addressing reviewers comments mengxr
4811378 [Alexander Ulanov] Removing println
87fb11f [Alexander Ulanov] Addressing reviewers comments mengxr. Added
confusion matrix
e3db569 [Alexander Ulanov] Addressing reviewers comments mengxr. Added true
positive rate and false positive rate. Test suite code style.
a7e8bf0 [Alexander Ulanov] Addressing reviewers comments mengxr
c3a77ad [Alexander Ulanov] Addressing reviewers comments mengxr
e2c91c3 [Alexander Ulanov] Fixes to mutliclass metics
d5ce981 [unknown] Comments about Double
a5c8ba4 [unknown] Unit tests. Class rename
fcee82d [unknown] Unit tests. Class rename
d535d62 [unknown] Multiclass evaluation
commit cb09e93c1d7ef9c8f0a1abe4e659783c74993a4e
Author: William Benton <[email protected]>
Date: 2014-07-15T16:13:39Z
Reformat multi-line closure argument.
Author: William Benton <[email protected]>
Closes #1419 from willb/reformat-2486 and squashes the following commits:
2676231 [William Benton] Reformat multi-line closure argument.
commit 9dd635eb5df52835b3b7f4f2b9c789da9e813c71
Author: witgo <[email protected]>
Date: 2014-07-15T17:46:17Z
SPARK-2480: Resolve sbt warnings "NOTE: SPARK_YARN is deprecated, please
use -Pyarn flag"
Author: witgo <[email protected]>
Closes #1404 from witgo/run-tests and squashes the following commits:
f703aee [witgo] fix Note: implicit method fromPairDStream is not applicable
here because it comes after the application point and it lacks an explicit
result type
2944f51 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn
flag"
ef59c70 [witgo] fix Note: implicit method fromPairDStream is not applicable
here because it comes after the application point and it lacks an explicit
result type
6cefee5 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn
flag"
commit 72ea56da8e383c61c6f18eeefef03b9af00f5158
Author: witgo <[email protected]>
Date: 2014-07-15T18:52:56Z
SPARK-1291: Link the spark UI to RM ui in yarn-client mode
Author: witgo <[email protected]>
Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
6022bcd [witgo] review commit
1fbb925 [witgo] add addAmIpFilter to yarn alpha
210299c [witgo] review commit
1b92a07 [witgo] review commit
6896586 [witgo] Add comments to addWebUIFilter
3e9630b [witgo] review commit
142ee29 [witgo] review commit
1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
commit e7ec815d9a2b0f89a56dc7dd3106c31a09492028
Author: Reynold Xin <[email protected]>
Date: 2014-07-15T20:13:33Z
Added LZ4 to compression codec in configuration page.
Author: Reynold Xin <[email protected]>
Closes #1417 from rxin/lz4 and squashes the following commits:
472f6a1 [Reynold Xin] Set the proper default.
9cf0b2f [Reynold Xin] Added LZ4 to compression codec in configuration page.
commit a21f9a7543309320bb2791468243c8f10bc6e81b
Author: Xiangrui Meng <[email protected]>
Date: 2014-07-15T21:00:54Z
[SPARK-2471] remove runtime scope for jets3t
The assembly jar (built by sbt) doesn't include jets3t if we set it to
runtime only, but I don't know whether it was set this way for a particular
reason.
CC: srowen ScrapCodes
Author: Xiangrui Meng <[email protected]>
Closes #1402 from mengxr/jets3t and squashes the following commits:
bfa2d17 [Xiangrui Meng] remove runtime scope for jets3t
commit 0f98ef1a2c9ecf328f6c5918808fa5ca486e8afd
Author: Michael Armbrust <[email protected]>
Date: 2014-07-15T21:01:48Z
[SPARK-2483][SQL] Fix parsing of repeated, nested data access.
Author: Michael Armbrust <[email protected]>
Closes #1411 from marmbrus/nestedRepeated and squashes the following
commits:
044fa09 [Michael Armbrust] Fix parsing of repeated, nested data access.
commit bcd0c30c7eea4c50301cb732c733fdf4d4142060
Author: Michael Armbrust <[email protected]>
Date: 2014-07-15T21:04:01Z
[SQL] Whitelist more Hive tests.
Author: Michael Armbrust <[email protected]>
Closes #1396 from marmbrus/moreTests and squashes the following commits:
6660b60 [Michael Armbrust] Blacklist a test that requires DFS command.
8b6001c [Michael Armbrust] Add golden files.
ccd8f97 [Michael Armbrust] Whitelist more tests.
commit 8af46d58464b96471825ce376c3e11c8b1108c0e
Author: Yin Huai <[email protected]>
Date: 2014-07-15T21:06:45Z
[SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer
failed to resolve references in the format of "tableName.fieldName"
Please refer to JIRA (https://issues.apache.org/jira/browse/SPARK-2474) for
how to reproduce the problem and my understanding of the root cause.
Author: Yin Huai <[email protected]>
Closes #1406 from yhuai/SPARK-2474 and squashes the following commits:
96b1627 [Yin Huai] Merge remote-tracking branch 'upstream/master' into
SPARK-2474
af36d65 [Yin Huai] Fix comment.
be86ba9 [Yin Huai] Correct SQL console settings.
c43ad00 [Yin Huai] Wrap the relation in a Subquery named by the table name
in OverrideCatalog.lookupRelation.
a5c2145 [Yin Huai] Support sql/console.
commit 61de65bc69f9a5fc396b76713193c6415436d452
Author: William Benton <[email protected]>
Date: 2014-07-15T21:11:57Z
SPARK-2407: Added internal implementation of SQL SUBSTR()
This replaces the Hive UDF for SUBSTR(ING) with an implementation in
Catalyst
and adds tests to verify correct operation.
Author: William Benton <[email protected]>
Closes #1359 from willb/internalSqlSubstring and squashes the following
commits:
ccedc47 [William Benton] Fixed too-long line.
a30a037 [William Benton] replace view bounds with implicit parameters
ec35c80 [William Benton] Adds fixes from review:
4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()
commit 502f90782ad474e2630ed5be4d3c4be7dab09c34
Author: Michael Armbrust <[email protected]>
Date: 2014-07-16T00:56:17Z
[SQL] Attribute equality comparisons should be done by exprId.
Author: Michael Armbrust <[email protected]>
Closes #1414 from marmbrus/exprIdResolution and squashes the following
commits:
97b47bc [Michael Armbrust] Attribute equality comparisons should be done by
exprId.
commit c2048a5165b270f5baf2003fdfef7bc6c5875715
Author: Zongheng Yang <[email protected]>
Date: 2014-07-16T00:58:28Z
[SPARK-2498] [SQL] Synchronize on a lock when using scala reflection inside
data type objects.
JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2498
Author: Zongheng Yang <[email protected]>
Closes #1423 from concretevitamin/scala-ref-catalyst and squashes the
following commits:
325a149 [Zongheng Yang] Synchronize on a lock when initializing data type
objects in Catalyst.
commit 4576d80a5155c9fbfebe9c36cca06c208bca5bd3
Author: Reynold Xin <[email protected]>
Date: 2014-07-16T01:47:39Z
[SPARK-2469] Use Snappy (instead of LZF) for default shuffle compression
codec
This reduces shuffle compression memory usage by 3x.
Author: Reynold Xin <[email protected]>
Closes #1415 from rxin/snappy and squashes the following commits:
06c1a01 [Reynold Xin] SPARK-2469: Use Snappy (instead of LZF) for default
shuffle compression codec.
commit 9c12de5092312319aa22f24df47a6de0e41a0102
Author: Henry Saputra <[email protected]>
Date: 2014-07-16T04:21:52Z
[SPARK-2500] Move the logInfo for registering BlockManager to
BlockManagerMasterActor.register method
PR for SPARK-2500
Move the logInfo call for BlockManager to BlockManagerMasterActor.register
instead of BlockManagerInfo constructor.
Previously the loginfo call for registering the registering a BlockManager
is happening in the BlockManagerInfo constructor. This kind of confusing
because the code could call "new BlockManagerInfo" without actually registering
a BlockManager and could confuse when reading the log files.
Author: Henry Saputra <[email protected]>
Closes #1424 from
hsaputra/move_registerblockmanager_log_to_registration_method and squashes the
following commits:
3370b4a [Henry Saputra] Move the loginfo for BlockManager to
BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
commit 563acf5edfbfb2fa756a1f0accde0940592663e9
Author: Ken Takagiwa <[email protected]>
Date: 2014-07-16T04:34:05Z
follow pep8 None should be compared using is or is not
http://legacy.python.org/dev/peps/pep-0008/
## Programming Recommendations
- Comparisons to singletons like None should always be done with is or is
not, never the equality operators.
Author: Ken Takagiwa <[email protected]>
Closes #1422 from giwa/apache_master and squashes the following commits:
7b361f3 [Ken Takagiwa] follow pep8 None should be checked using is or is not
commit 90ca532a0fd95dc85cff8c5722d371e8368b2687
Author: Aaron Staple <[email protected]>
Date: 2014-07-16T04:35:36Z
[SPARK-2314][SQL] Override collect and take in JavaSchemaRDD, forwarding to
SchemaRDD implementations.
Author: Aaron Staple <[email protected]>
Closes #1421 from staple/SPARK-2314 and squashes the following commits:
73e04dc [Aaron Staple] [SPARK-2314] Override collect and take in
JavaSchemaRDD, forwarding to SchemaRDD implementations.
commit 9b38b7c71352bb5e6d359515111ad9ca33299127
Author: Takuya UESHIN <[email protected]>
Date: 2014-07-16T05:35:34Z
[SPARK-2509][SQL] Add optimization for Substring.
`Substring` including `null` literal cases could be added to
`NullPropagation`.
Author: Takuya UESHIN <[email protected]>
Closes #1428 from ueshin/issues/SPARK-2509 and squashes the following
commits:
d9eb85f [Takuya UESHIN] Add Substring cases to NullPropagation.
commit 632fb3d9a9ebb3d2218385403145d5b89c41c025
Author: Takuya UESHIN <[email protected]>
Date: 2014-07-16T05:43:48Z
[SPARK-2504][SQL] Fix nullability of Substring expression.
This is a follow-up of #1359 with nullability narrowing.
Author: Takuya UESHIN <[email protected]>
Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following
commits:
5157832 [Takuya UESHIN] Remove unnecessary white spaces.
80958ac [Takuya UESHIN] Fix nullability of Substring expression.
commit efc452a16322e8b20b3c4fe1d6847315f928cd2d
Author: Cheng Lian <[email protected]>
Date: 2014-07-16T16:44:51Z
[SPARK-2119][SQL] Improved Parquet performance when reading off S3
JIRA issue: [SPARK-2119](https://issues.apache.org/jira/browse/SPARK-2119)
Essentially this PR fixed three issues to gain much better performance when
reading large Parquet file off S3.
1. When reading the schema, fetching Parquet metadata from a part-file
rather than the `_metadata` file
The `_metadata` file contains metadata of all row groups, and can be
very large if there are many row groups. Since schema information and row group
metadata are coupled within a single Thrift object, we have to read the whole
`_metadata` to fetch the schema. On the other hand, schema is replicated among
footers of all part-files, which are fairly small.
1. Only add the root directory of the Parquet file rather than all the
part-files to input paths
HDFS API can automatically filter out all hidden files and underscore
files (`_SUCCESS` & `_metadata`), there's no need to filter out all part-files
and add them individually to input paths. What make it much worse is that,
`FileInputFormat.listStatus()` calls `FileSystem.globStatus()` on each
individual input path sequentially, each results a blocking remote S3 HTTP
request.
1. Worked around
[PARQUET-16](https://issues.apache.org/jira/browse/PARQUET-16)
Essentially PARQUET-16 is similar to the above issue, and results lots
of sequential `FileSystem.getFileStatus()` calls, which are further translated
into a bunch of remote S3 HTTP requests.
`FilteringParquetRowInputFormat` should be cleaned up once PARQUET-16 is
fixed.
Below is the micro benchmark result. The dataset used is a S3 Parquet file
consists of 3,793 partitions, about 110MB per partition in average. The
benchmark is done with a 9-node AWS cluster.
- Creating a Parquet `SchemaRDD` (Parquet schema is fetched)
```scala
val tweets = parquetFile(uri)
```
- Before: 17.80s
- After: 8.61s
- Fetching partition information
```scala
tweets.getPartitions
```
- Before: 700.87s
- After: 21.47s
- Counting the whole file (both steps above are executed altogether)
```scala
parquetFile(uri).count()
```
- Before: ??? (haven't test yet)
- After: 53.26s
Author: Cheng Lian <[email protected]>
Closes #1370 from liancheng/faster-parquet and squashes the following
commits:
94a2821 [Cheng Lian] Added comments about schema consistency
d2c4417 [Cheng Lian] Worked around PARQUET-16 to improve Parquet performance
1c0d1b9 [Cheng Lian] Accelerated Parquet schema retrieving
5bd3d29 [Cheng Lian] Fixed Parquet log level
commit 33e64ecacbc44567f9cba2644a30a118653ea5fa
Author: Rui Li <[email protected]>
Date: 2014-07-16T17:23:37Z
SPARK-2277: make TaskScheduler track hosts on rack
Hi mateiz, I've created
[SPARK-2277](https://issues.apache.org/jira/browse/SPARK-2277) to make
TaskScheduler track hosts on each rack. Please help to review, thanks.
Author: Rui Li <[email protected]>
Closes #1212 from lirui-intel/trackHostOnRack and squashes the following
commits:
2b4bd0f [Rui Li] SPARK-2277: refine UT
fbde838 [Rui Li] SPARK-2277: add UT
7bbe658 [Rui Li] SPARK-2277: rename the method
5e4ef62 [Rui Li] SPARK-2277: remove unnecessary import
79ac750 [Rui Li] SPARK-2277: make TaskScheduler track hosts on rack
commit efe2a8b1262a371471f52ca7d47dc34789e80558
Author: Reynold Xin <[email protected]>
Date: 2014-07-16T17:44:54Z
Tightening visibility for various Broadcast related classes.
In preparation for SPARK-2521.
Author: Reynold Xin <[email protected]>
Closes #1438 from rxin/broadcast and squashes the following commits:
432f1cc [Reynold Xin] Tightening visibility for various Broadcast related
classes.
commit df95d82da7c76c074fd4064f7c870d55d99e0d8e
Author: Yin Huai <[email protected]>
Date: 2014-07-16T17:53:59Z
[SPARK-2525][SQL] Remove as many compilation warning messages as possible
in Spark SQL
JIRA: https://issues.apache.org/jira/browse/SPARK-2525.
Author: Yin Huai <[email protected]>
Closes #1444 from yhuai/SPARK-2517 and squashes the following commits:
edbac3f [Yin Huai] Removed some compiler type erasure warnings.
commit 1c5739f68510c2336bf6cb3e18aea03d85988bfb
Author: Reynold Xin <[email protected]>
Date: 2014-07-16T17:55:47Z
[SQL] Cleaned up ConstantFolding slightly.
Moved couple rules out of NullPropagation and added more comments.
Author: Reynold Xin <[email protected]>
Closes #1430 from rxin/sql-folding-rule and squashes the following commits:
7f9a197 [Reynold Xin] Updated documentation for ConstantFolding.
7f8cf61 [Reynold Xin] [SQL] Cleaned up ConstantFolding slightly.
commit fc7edc9e76f97b25e456ae7b72ef8636656f4f1a
Author: Sandy Ryza <[email protected]>
Date: 2014-07-16T18:07:16Z
SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical...
... aggregation code
Author: Sandy Ryza <[email protected]>
Closes #1435 from sryza/sandy-spark-2519 and squashes the following commits:
640706a [Sandy Ryza] SPARK-2519. Eliminate pattern-matching on Tuple2 in
performance-critical aggregation code
commit cc965eea510397642830acb21f61127b68c098d6
Author: Takuya UESHIN <[email protected]>
Date: 2014-07-16T18:13:38Z
[SPARK-2518][SQL] Fix foldability of Substring expression.
This is a follow-up of #1428.
Author: Takuya UESHIN <[email protected]>
Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following
commits:
37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
commit ef48222c10be3d29a83dfc2329f455eba203cd38
Author: Reynold Xin <[email protected]>
Date: 2014-07-16T18:15:07Z
[SPARK-2517] Remove some compiler warnings.
Author: Reynold Xin <[email protected]>
Closes #1433 from rxin/compile-warning and squashes the following commits:
8d0b890 [Reynold Xin] Remove some compiler warnings.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]