[GitHub] spark pull request: [SPARK-4651] Adding -Phadoop-2.5 and -Phadoop-...

oza Fri, 28 Nov 2014 16:47:01 -0800

GitHub user oza opened a pull request:

    https://github.com/apache/spark/pull/3512


    [SPARK-4651] Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with 
newer versions of Hadoop

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/oza/spark SPARK-4651

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3512
    
----
commit 44856654c81ceb92ef6380691027744d4bf76589
Author: Hari Shreedharan <[email protected]>
Date:   2014-08-18T02:50:31Z

    [HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind to in 
Flume Polling Tests.
    
    Author: Hari Shreedharan <[email protected]>
    
    Closes #1820 from harishreedharan/use-free-ports and squashes the following 
commits:
    
    b939067 [Hari Shreedharan] Remove unused import.
    67856a8 [Hari Shreedharan] Remove findFreePort.
    0ea51d1 [Hari Shreedharan] Make some changes to getPort to use map on the 
serverOpt.
    1fb0283 [Hari Shreedharan] Merge branch 'master' of 
https://github.com/apache/spark into use-free-ports
    b351651 [Hari Shreedharan] Allow Netty to choose port, and query it to 
decide the port to bind to. Leaving findFreePort as is, if other tests want to 
use it at some point.
    e6c9620 [Hari Shreedharan] Making sure the second sink uses the correct 
port.
    11c340d [Hari Shreedharan] Add info about race condition to scaladoc.
    e89d135 [Hari Shreedharan] Adding Scaladoc.
    6013bb0 [Hari Shreedharan] [STREAMING] Find free ports to use before 
attempting to create Flume Sink in Flume Polling Suite

commit 1d5e84a99076d3e0168dd2f4626c7911e7ba49e7
Author: Patrick Wendell <[email protected]>
Date:   2014-08-21T05:24:22Z

    HOTFIX:Temporarily removing flume sink test in 1.1 branch

commit e1535ad3c6f7400f2b7915ea91da9c60510557ba
Author: Patrick Wendell <[email protected]>
Date:   2014-08-21T05:54:41Z

    [maven-release-plugin] prepare release v1.1.0-snapshot2

commit 9af3fb7385d1f9f221962f1d2d725ff79bd82033
Author: Patrick Wendell <[email protected]>
Date:   2014-08-21T05:54:48Z

    [maven-release-plugin] prepare for next development iteration

commit da0a701204ae057581ed2d41eba5bb610e36c864
Author: Patrick Wendell <[email protected]>
Date:   2014-08-20T19:18:41Z

    BUILD: Bump Hadoop versions in the release build.
    
    Also, minor modifications to the MapR profile.

commit 1e5d9cbb499199304aa8820114fa77dc7a3f0224
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-21T07:17:29Z

    [SPARK-2840] [mllib] DecisionTree doc update (Java, Python examples)
    
    Updated DecisionTree documentation, with examples for Java, Python.
    Added same Java example to code as well.
    CC: @mengxr  @manishamde @atalwalkar
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #2063 from jkbradley/dt-docs and squashes the following commits:
    
    2dd2c19 [Joseph K. Bradley] Last updates based on github review.
    9dd1b6b [Joseph K. Bradley] Updated decision tree doc.
    d802369 [Joseph K. Bradley] Updates based on comments: cache data, 
corrected doc text.
    b9bee04 [Joseph K. Bradley] Updated DT examples
    57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example 
in docs, and corrected doc example as needed.
    d939a92 [Joseph K. Bradley] Updated DecisionTree documentation.  Added 
Java, Python examples.
    
    (cherry picked from commit 050f8d01e47b9b67b02ce50d83fb7b4e528b7204)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 385c4f2af5996844b9761942643f71a6544e1dd8
Author: Patrick Wendell <[email protected]>
Date:   2014-08-23T04:31:52Z

    Revert "HOTFIX:Temporarily removing flume sink test in 1.1 branch"
    
    This reverts commit 1d5e84a99076d3e0168dd2f4626c7911e7ba49e7.

commit cd73631b15f080405e04203bf15fbd31c65eb64a
Author: Tathagata Das <[email protected]>
Date:   2014-08-23T04:34:48Z

    [SPARK-3169] Removed dependency on spark streaming test from spark flume 
sink
    
    Due to maven bug https://jira.codehaus.org/browse/MNG-1378, maven could not 
resolve spark streaming classes required by the spark-streaming test-jar 
dependency of external/flume-sink. There is no particular reason that the 
external/flume-sink has to depend on Spark Streaming at all, so I am 
eliminating this dependency. Also I have removed the exclusions present in the 
Flume dependencies, as there is no reason to exclude them (they were excluded 
in the external/flume module to prevent dependency collisions with Spark).
    
    Since Jenkins will test the sbt build and the unit test, I only tested 
maven compilation locally.
    
    Author: Tathagata Das <[email protected]>
    
    Closes #2101 from tdas/spark-sink-pom-fix and squashes the following 
commits:
    
    8f42621 [Tathagata Das] Added Flume sink exclusions back, and added netty 
to test dependencies
    93b559f [Tathagata Das] Removed dependency on spark streaming test from 
spark flume sink
    (cherry picked from commit 3004074152b7261c2a968bb8e94ec7c41a7b43c1)
    
    Signed-off-by: Patrick Wendell <[email protected]>

commit 568966018bff437f1d73cd59eb4681b2d3e87b48
Author: Kousuke Saruta <[email protected]>
Date:   2014-08-23T05:28:05Z

    [SPARK-2963] REGRESSION - The description about how to build for using CLI 
and Thrift JDBC server is absent in proper document  -
    
    The most important things I mentioned in #1885 is as follows.
    
    * People who build Spark is not always programmer.
    * If a person who build Spark is not a programmer, he/she won't read 
programmer's guide before building.
    
    So, how to build for using CLI and JDBC server is not only in programmer's 
guide.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #2080 from sarutak/SPARK-2963 and squashes the following commits:
    
    ee07c76 [Kousuke Saruta] Modified regression of the description about 
building for using Thrift JDBC server and CLI
    ed53329 [Kousuke Saruta] Modified description and notaton of proper noun
    07c59fc [Kousuke Saruta] Added a description about how to build to use 
HiveServer and CLI for SparkSQL to building-with-maven.md
    6e6645a [Kousuke Saruta] Merge branch 'master' of 
git://git.apache.org/spark into SPARK-2963
    c88fa93 [Kousuke Saruta] Added a description about building to use 
HiveServer and CLI for SparkSQL

commit 9309786416c83b2f3401724fdeb19c2be07c0431
Author: Yin Huai <[email protected]>
Date:   2014-08-23T19:46:41Z

    [SQL] Make functionRegistry in HiveContext transient.
    
    Seems we missed `transient` for the `functionRegistry` in `HiveContext`.
    
    cc: marmbrus
    
    Author: Yin Huai <[email protected]>
    
    Closes #2074 from yhuai/makeFunctionRegistryTransient and squashes the 
following commits:
    
    6534e7d [Yin Huai] Make functionRegistry transient.
    
    (cherry picked from commit 2fb1c72ea21e137c8b60a72e5aecd554c71b16e1)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 7112da8fe8d382a1180118f206db78f8e610d83f
Author: Michael Armbrust <[email protected]>
Date:   2014-08-23T23:19:10Z

    [SPARK-2554][SQL] CountDistinct partial aggregation and object allocation 
improvements
    
    Author: Michael Armbrust <[email protected]>
    Author: Gregory Owen <[email protected]>
    
    Closes #1935 from marmbrus/countDistinctPartial and squashes the following 
commits:
    
    5c7848d [Michael Armbrust] turn off caching in the constructor
    8074a80 [Michael Armbrust] fix tests
    32d216f [Michael Armbrust] reynolds comments
    c122cca [Michael Armbrust] Address comments, add tests
    b2e8ef3 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into countDistinctPartial
    fae38f4 [Michael Armbrust] Fix style
    fdca896 [Michael Armbrust] cleanup
    93d0f64 [Michael Armbrust] metastore concurrency fix.
    db44a30 [Michael Armbrust] JIT hax.
    3868f6c [Michael Armbrust] Merge pull request #9 from 
GregOwen/countDistinctPartial
    c9e67de [Gregory Owen] Made SpecificRow and types serializable by Kryo
    2b46c4b [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into countDistinctPartial
    8ff6402 [Michael Armbrust] Add specific row.
    58d15f1 [Michael Armbrust] disable codegen logging
    87d101d [Michael Armbrust] Fix isNullAt bug
    abee26d [Michael Armbrust] WIP
    27984d0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into countDistinctPartial
    57ae3b1 [Michael Armbrust] Fix order dependent test
    b3d0f64 [Michael Armbrust] Add golden files.
    c1f7114 [Michael Armbrust] Improve tests / fix serialization.
    f31b8ad [Michael Armbrust] more fixes
    38c7449 [Michael Armbrust] comments and style
    9153652 [Michael Armbrust] better toString
    d494598 [Michael Armbrust] Fix tests now that the planner is better
    41fbd1d [Michael Armbrust] Never try and create an empty hash set.
    050bb97 [Michael Armbrust] Skip no-arg constructors for kryo,
    bd08239 [Michael Armbrust] WIP
    213ada8 [Michael Armbrust] First draft of partially aggregated and code 
generated count distinct / max
    
    (cherry picked from commit 7e191fe29bb09a8560cd75d453c4f7f662dff406)
    Signed-off-by: Michael Armbrust <[email protected]>

commit e23f0bc0177a83dfee3f5579ae6eb12033ae5f90
Author: Michael Armbrust <[email protected]>
Date:   2014-08-23T23:21:08Z

    [SPARK-2967][SQL]  Follow-up: Also copy hash expressions in sort based 
shuffle fix.
    
    Follow-up to #2066
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #2072 from marmbrus/sortShuffle and squashes the following commits:
    
    2ff8114 [Michael Armbrust] Fix bug
    
    (cherry picked from commit 3519b5e8e55b4530d7f7c0bcab254f863dbfa814)
    Signed-off-by: Michael Armbrust <[email protected]>

commit ce14cd11f099e46532074bc23a7ffb1bad0969e6
Author: Kousuke Saruta <[email protected]>
Date:   2014-08-24T16:43:44Z

    [SPARK-3192] Some scripts have 2 space indentation but other scripts have 4 
space indentation.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #2104 from sarutak/SPARK-3192 and squashes the following commits:
    
    db78419 [Kousuke Saruta] Modified indentation of spark-shell
    (cherry picked from commit ded6796bf54f5c005b27135d7dec19634038a1c6)
    
    Signed-off-by: Patrick Wendell <[email protected]>

commit a4db81a55f266f904052525aa290b7ffcf9a613c
Author: DB Tsai <[email protected]>
Date:   2014-08-25T00:33:33Z

    [SPARK-2841][MLlib] Documentation for feature transformations
    
    Documentation for newly added feature transformations:
    1. TF-IDF
    2. StandardScaler
    3. Normalizer
    
    Author: DB Tsai <[email protected]>
    
    Closes #2068 from dbtsai/transformer-documentation and squashes the 
following commits:
    
    109f324 [DB Tsai] address feedback
    
    (cherry picked from commit 572952ae615895efaaabcd509d582262000c0852)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 749bddc85e76e0d1ded8d79058819335bd580741
Author: Reza Zadeh <[email protected]>
Date:   2014-08-25T00:35:54Z

    [MLlib][SPARK-2997] Update SVD documentation to reflect roughly square
    
    Update the documentation to reflect the fact we can handle roughly square 
matrices.
    
    Author: Reza Zadeh <[email protected]>
    
    Closes #2070 from rezazadeh/svddocs and squashes the following commits:
    
    826b8fe [Reza Zadeh] left singular vectors
    3f34fc6 [Reza Zadeh] PCA is still TS
    7ffa2aa [Reza Zadeh] better title
    aeaf39d [Reza Zadeh] More docs
    788ed13 [Reza Zadeh] add computational cost explanation
    6429c59 [Reza Zadeh] Add link to rowmatrix docs
    1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square
    
    (cherry picked from commit b1b20301b3a1b35564d61e58eb5964d5ad5e4d7d)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit b82da3d6924a5bd2139434ab05c2fd44914fda45
Author: Davies Liu <[email protected]>
Date:   2014-08-25T04:16:05Z

    [SPARK-2871] [PySpark] add zipWithIndex() and zipWithUniqueId()
    
    RDD.zipWithIndex()
    
            Zips this RDD with its element indices.
    
            The ordering is first based on the partition index and then the
            ordering of items within each partition. So the first item in
            the first partition gets index 0, and the last item in the last
            partition receives the largest index.
    
            This method needs to trigger a spark job when this RDD contains
            more than one partitions.
    
            >>> sc.parallelize(range(4), 2).zipWithIndex().collect()
            [(0, 0), (1, 1), (2, 2), (3, 3)]
    
    RDD.zipWithUniqueId()
    
            Zips this RDD with generated unique Long ids.
    
            Items in the kth partition will get ids k, n+k, 2*n+k, ..., where
            n is the number of partitions. So there may exist gaps, but this
            method won't trigger a spark job, which is different from
            L{zipWithIndex}
    
            >>> sc.parallelize(range(4), 2).zipWithUniqueId().collect()
            [(0, 0), (2, 1), (1, 2), (3, 3)]
    
    Author: Davies Liu <[email protected]>
    
    Closes #2092 from davies/zipWith and squashes the following commits:
    
    cebe5bf [Davies Liu] improve test cases, reverse the order of index
    0d2a128 [Davies Liu] add zipWithIndex() and zipWithUniqueId()
    
    (cherry picked from commit fb0db772421b6902b80137bf769db3b418ab2ccf)
    Signed-off-by: Josh Rosen <[email protected]>

commit 69a17f119758e786ef080cfbf52d484334c8d9d9
Author: Xiangrui Meng <[email protected]>
Date:   2014-08-25T19:30:02Z

    [SPARK-2495][MLLIB] make KMeans constructor public
    
    to re-construct k-means models freeman-lab
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #2112 from mengxr/public-constructors and squashes the following 
commits:
    
    18d53a9 [Xiangrui Meng] make KMeans constructor public
    
    (cherry picked from commit 220f413686ae922bd11776576bf37610cce92c23)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit ff616fd7b4b56c34bd473f85fab3524b842da404
Author: Sean Owen <[email protected]>
Date:   2014-08-25T20:29:07Z

    SPARK-2798 [BUILD] Correct several small errors in Flume module pom.xml 
files
    
    (EDIT) Since the scalatest issue was since resolved, this is now about a 
few small problems in the Flume Sink `pom.xml`
    
    - `scalatest` is not declared as a test-scope dependency
    - Its Avro version doesn't match the rest of the build
    - Its Flume version is not synced with the other Flume module
    - The other Flume module declares its dependency on Flume Sink slightly 
incorrectly, hard-coding the Scala 2.10 version
    - It depends on Scala Lang directly, which it shouldn't
    
    Author: Sean Owen <[email protected]>
    
    Closes #1726 from srowen/SPARK-2798 and squashes the following commits:
    
    a46e2c6 [Sean Owen] scalatest to test scope, harmonize Avro and Flume 
versions, remove direct Scala dependency, fix '2.10' in Flume dependency
    
    (cherry picked from commit cd30db566a327ddf63cd242c758e46ce2d9479df)
    Signed-off-by: Tathagata Das <[email protected]>

commit d892062cca16bd9d977e1cf51723135a481edf57
Author: Xiangrui Meng <[email protected]>
Date:   2014-08-25T21:55:20Z

    [FIX] fix error message in sendMessageReliably
    
    rxin
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #2120 from mengxr/sendMessageReliably and squashes the following 
commits:
    
    b14400c [Xiangrui Meng] fix error message in sendMessageReliably
    
    (cherry picked from commit fd8ace2d9a796f69ce34ad202907008cd6e4d274)
    Signed-off-by: Josh Rosen <[email protected]>

commit 8d33a6d3de9184ee33ebe5f30fef6a1fda281e9d
Author: Cheng Lian <[email protected]>
Date:   2014-08-25T21:56:51Z

    Fixed a typo in docs/running-on-mesos.md
    
    It should be `spark-env.sh` rather than `spark.env.sh`.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #2119 from liancheng/fix-mesos-doc and squashes the following 
commits:
    
    f360548 [Cheng Lian] Fixed a typo in docs/running-on-mesos.md
    
    (cherry picked from commit 805fec845b7aa8b4763e3e0e34bec6c3872469f4)
    Signed-off-by: Josh Rosen <[email protected]>

commit 19b01d6f79f2919257fcd14524bc8267c57eb3d9
Author: Takuya UESHIN <[email protected]>
Date:   2014-08-25T23:27:00Z

    [SPARK-3204][SQL] MaxOf would be foldable if both left and right are 
foldable.
    
    Author: Takuya UESHIN <[email protected]>
    
    Closes #2116 from ueshin/issues/SPARK-3204 and squashes the following 
commits:
    
    7d9b107 [Takuya UESHIN] Make MaxOf foldable if both left and right are 
foldable.
    
    (cherry picked from commit d299e2bf2f6733a6267b7ce85e2b288608b17db3)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 292f28d4f7cbfdb8b90809926a6d69df7ed817e7
Author: Cheng Lian <[email protected]>
Date:   2014-08-25T23:29:59Z

    [SPARK-2929][SQL] Refactored Thrift server and CLI suites
    
    Removed most hard coded timeout, timing assumptions and all `Thread.sleep`. 
Simplified IPC and synchronization with `scala.sys.process` and future/promise 
so that the test suites can run more robustly and faster.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #1856 from liancheng/thriftserver-tests and squashes the following 
commits:
    
    2d914ca [Cheng Lian] Minor refactoring
    0e12e71 [Cheng Lian] Cleaned up test output
    0ee921d [Cheng Lian] Refactored Thrift server and CLI suites
    
    (cherry picked from commit cae9414d3805c6cf00eab6a6144d8f90cd0212f8)
    Signed-off-by: Michael Armbrust <[email protected]>

commit f8ac8ed7f88d2ee976b38d4a156f64efb3740650
Author: Cheng Hao <[email protected]>
Date:   2014-08-26T00:43:56Z

    [SPARK-3058] [SQL] Support EXTENDED for EXPLAIN
    
    Provide `extended` keyword support for `explain` command in SQL. e.g.
    ```
    explain extended select key as a1, value as a2 from src where key=1;
    == Parsed Logical Plan ==
    Project ['key AS a1#3,'value AS a2#4]
     Filter ('key = 1)
      UnresolvedRelation None, src, None
    
    == Analyzed Logical Plan ==
    Project [key#8 AS a1#3,value#9 AS a2#4]
     Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType))
      MetastoreRelation default, src, None
    
    == Optimized Logical Plan ==
    Project [key#8 AS a1#3,value#9 AS a2#4]
     Filter (CAST(key#8, DoubleType) = 1.0)
      MetastoreRelation default, src, None
    
    == Physical Plan ==
    Project [key#8 AS a1#3,value#9 AS a2#4]
     Filter (CAST(key#8, DoubleType) = 1.0)
      HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), 
None
    
    Code Generation: false
    == RDD ==
    (2) MappedRDD[14] at map at HiveContext.scala:350
      MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42
      MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57
      MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112
      MappedRDD[10] at map at TableReader.scala:240
      HadoopRDD[9] at HadoopRDD at TableReader.scala:230
    ```
    
    It's the sub task of #1847. But can go without any dependency.
    
    Author: Cheng Hao <[email protected]>
    
    Closes #1962 from chenghao-intel/explain_extended and squashes the 
following commits:
    
    295db74 [Cheng Hao] Fix bug in printing the simple execution plan
    48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN
    
    (cherry picked from commit 156eb3966176de02ec3ec90ae10e50a7ebfbbf4f)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 957b356576caa2ab38d1e758c2d3190421894557
Author: wangfei <[email protected]>
Date:   2014-08-26T00:46:43Z

    [SQL] logWarning should be logInfo in getResultSetSchema
    
    Author: wangfei <[email protected]>
    
    Closes #1939 from scwf/patch-5 and squashes the following commits:
    
    f952d10 [wangfei] [SQL] logWarning should be logInfo in getResultSetSchema
    
    (cherry picked from commit 507a1b520063ad3e10b909767d9e3fd72d24415b)
    Signed-off-by: Michael Armbrust <[email protected]>

commit b5dc9b43bcdcbdb5ffddbda6235443f3d7411b7a
Author: Chia-Yung Su <[email protected]>
Date:   2014-08-26T01:20:19Z

    [SPARK-3011][SQL] _temporary directory should be filtered out by 
sqlContext.parquetFile
    
    fix compile error on hadoop 0.23 for the pull request #1924.
    
    Author: Chia-Yung Su <[email protected]>
    
    Closes #1959 from joesu/bugfix-spark3011 and squashes the following commits:
    
    be30793 [Chia-Yung Su] remove .* and _* except _metadata
    8fe2398 [Chia-Yung Su] add note to explain
    40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error
    c7e44f2 [Chia-Yung Su] match syntax
    f8fc32a [Chia-Yung Su] filter out tmp dir
    
    (cherry picked from commit 4243bb6634aca5b9ddf6d42778aa7b4866ce6256)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 4d6a0e920974a5d50348ba9f7377b48e43c2da16
Author: witgo <[email protected]>
Date:   2014-08-26T02:22:27Z

    SPARK-2481: The environment variables SPARK_HISTORY_OPTS is covered in 
spark-env.sh
    
    Author: witgo <[email protected]>
    Author: GuoQiang Li <[email protected]>
    
    Closes #1341 from witgo/history_env and squashes the following commits:
    
    b4fd9f8 [GuoQiang Li] review commit
    0ebe401 [witgo] *-history-server.sh load spark-config.sh
    
    (cherry picked from commit 9f04db17e50568d5580091add9100693177d7c4f)
    Signed-off-by: Andrew Or <[email protected]>

commit 48a07490fdd0e79a34e66e5c1baad0b1558bbda5
Author: Daoyuan Wang <[email protected]>
Date:   2014-08-26T05:56:35Z

    [Spark-3222] [SQL] Cross join support in HiveQL
    
    We can simple treat cross join as inner join without join conditions.
    
    Author: Daoyuan Wang <[email protected]>
    Author: adrian-wang <[email protected]>
    
    Closes #2124 from adrian-wang/crossjoin and squashes the following commits:
    
    8c9b7c5 [Daoyuan Wang] add a test
    7d47bbb [adrian-wang] add cross join support for hql
    
    (cherry picked from commit 52fbdc2deddcdba02bf5945a36e15870021ec890)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 0f947f1239831a6ed3b47af65816715999bbe57b
Author: Andrew Or <[email protected]>
Date:   2014-08-26T06:36:09Z

    [SPARK-2886] Use more specific actor system name than "spark"
    
    As of #1777 we log the name of the actor system when it binds to a port. 
The current name "spark" is super general and does not convey any meaning. For 
instance, the following line is taken from my driver log after setting 
`spark.driver.port` to 5001.
    ```
    14/08/13 19:33:29 INFO Remoting: Remoting started; listening on addresses:
    [akka.tcp://sparkandrews-mbp:5001]
    14/08/13 19:33:29 INFO Remoting: Remoting now listens on addresses:
    [akka.tcp://sparkandrews-mbp:5001]
    14/08/06 13:40:05 INFO Utils: Successfully started service 'spark' on port 
5001.
    ```
    This commit renames this to "sparkDriver" and "sparkExecutor". The goal of 
this unambitious PR is simply to make the logged information more explicit 
without introducing any change in functionality.
    
    Author: Andrew Or <[email protected]>
    
    Closes #1810 from andrewor14/service-name and squashes the following 
commits:
    
    8c459ed [Andrew Or] Use a common variable for driver/executor actor system 
names
    3a92843 [Andrew Or] Change actor name to sparkDriver and sparkExecutor
    921363e [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
service-name
    c8c6a62 [Andrew Or] Do not include hyphens in actor name
    1c1b42e [Andrew Or] Avoid spaces in akka system name
    f644b55 [Andrew Or] Use more specific service name
    
    (cherry picked from commit b21ae5bbb9baa966f69303a30659aa8bbb2098da)
    Signed-off-by: Andrew Or <[email protected]>

commit 3a9d874d7a46ab8b015631d91ba479d9a0ba827f
Author: chutium <[email protected]>
Date:   2014-08-26T18:51:26Z

    [SPARK-3131][SQL] Allow user to set parquet compression codec for writing 
ParquetFile in SQLContext
    
    There are 4 different compression codec available for 
```ParquetOutputFormat```
    
    in Spark SQL, it was set as a hard-coded value in 
```ParquetRelation.defaultCompression```
    
    original discuss:
    https://github.com/apache/spark/pull/195#discussion-diff-11002083
    
    i added a new config property in SQLConf to allow user to change this 
compression codec, and i used similar short names syntax as described in 
SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0)
    
    btw, which codec should we use as default? it was set to GZIP 
(https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we 
should change this to SNAPPY, since SNAPPY is already the default codec for 
shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy 
codec natively 
(https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632).
    
    Author: chutium <[email protected]>
    
    Closes #2039 from chutium/parquet-compression and squashes the following 
commits:
    
    2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set 
to snappy, also in test suite
    e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name 
and default codec set to snappy
    21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression 
codec for writing ParquetFile in SQLContext
    
    (cherry picked from commit 8856c3d86009295be871989a5dc7270f31b420cd)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 83d273023b03faa0ceacd69956a132f40d247bc1
Author: Davies Liu <[email protected]>
Date:   2014-08-26T20:04:30Z

    [SPARK-2871] [PySpark] add histgram() API
    
    RDD.histogram(buckets)
    
            Compute a histogram using the provided buckets. The buckets
            are all open to the right except for the last which is closed.
            e.g. [1,10,20,50] means the buckets are [1,10) [10,20) [20,50],
            which means 1<=x<10, 10<=x<20, 20<=x<=50. And on the input of 1
            and 50 we would have a histogram of 1,0,1.
    
            If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
            this can be switched from an O(log n) inseration to O(1) per
            element(where n = # buckets).
    
            Buckets must be sorted and not contain any duplicates, must be
            at least two elements.
    
            If `buckets` is a number, it will generates buckets which is
            evenly spaced between the minimum and maximum of the RDD. For
            example, if the min value is 0 and the max is 100, given buckets
            as 2, the resulting buckets will be [0,50) [50,100]. buckets must
            be at least 1 If the RDD contains infinity, NaN throws an exception
            If the elements in RDD do not vary (max == min) always returns
            a single bucket.
    
            It will return an tuple of buckets and histogram.
    
            >>> rdd = sc.parallelize(range(51))
            >>> rdd.histogram(2)
            ([0, 25, 50], [25, 26])
            >>> rdd.histogram([0, 5, 25, 50])
            ([0, 5, 25, 50], [5, 20, 26])
            >>> rdd.histogram([0, 15, 30, 45, 60], True)
            ([0, 15, 30, 45, 60], [15, 15, 15, 6])
            >>> rdd = sc.parallelize(["ab", "ac", "b", "bd", "ef"])
            >>> rdd.histogram(("a", "b", "c"))
            (('a', 'b', 'c'), [2, 2])
    
    closes #122, it's duplicated.
    
    Author: Davies Liu <[email protected]>
    
    Closes #2091 from davies/histgram and squashes the following commits:
    
    a322f8a [Davies Liu] fix deprecation of e.message
    84e85fa [Davies Liu] remove evenBuckets, add more tests (including str)
    d9a0722 [Davies Liu] address comments
    0e18a2d [Davies Liu] add histgram() API
    
    (cherry picked from commit 3cedc4f4d78e093fd362085e0a077bb9e4f28ca5)
    Signed-off-by: Josh Rosen <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4651] Adding -Phadoop-2.5 and -Phadoop-...

Reply via email to