[GitHub] spark pull request: 更新最新代� �

changkaibo Sat, 15 Aug 2015 05:14:18 -0700

GitHub user changkaibo opened a pull request:

    https://github.com/apache/spark/pull/8221


    æ´æ°ææ°ä»£ç 

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8221
    
----
commit 4c4f638c7333b44049c75ae34486148ab74db333
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:54:50Z

    Preparing Spark release v1.5.0-snapshot-20150803

commit bc49ca468d3abe4949382a32de92f963f454d36a
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:54:56Z

    Preparing development version 1.5.0-SNAPSHOT

commit 7e7147f3b8fee3ac4f2f1d14c3e6776a4d76bb3a
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:59:13Z

    Preparing Spark release v1.5.0-snapshot-20150803

commit 74792e71cb0584637041cb81660ec3ac4ea10c0b
Author: Patrick Wendell <[email protected]>
Date:   2015-08-03T23:59:19Z

    Preparing development version 1.5.0-SNAPSHOT

commit 73c863ac8e8f6cf664f51c64da1da695f341b273
Author: Matthew Brandyberry <[email protected]>
Date:   2015-08-04T00:36:56Z

    [SPARK-9483] Fix UTF8String.getPrefix for big-endian.
    
    Previous code assumed little-endian.
    
    Author: Matthew Brandyberry <[email protected]>
    
    Closes #7902 from mtbrandy/SPARK-9483 and squashes the following commits:
    
    ec31df8 [Matthew Brandyberry] [SPARK-9483] Changes from review comments.
    17d54c6 [Matthew Brandyberry] [SPARK-9483] Fix UTF8String.getPrefix for 
big-endian.
    
    (cherry picked from commit b79b4f5f2251ed7efeec1f4b26e45a8ea6b85a6a)
    Signed-off-by: Davies Liu <[email protected]>

commit 34335719a372c1951fdb4dd25b75b086faf1076f
Author: Burak Yavuz <[email protected]>
Date:   2015-08-04T00:42:03Z

    [SPARK-9263] Added flags to exclude dependencies when using --packages
    
    While the functionality is there to exclude packages, there are no flags 
that allow users to exclude dependencies, in case of dependency conflicts. We 
should provide users with a flag to add dependency exclusions in case the 
packages are not resolved properly (or not available due to licensing).
    
    The flag I added was --packages-exclude, but I'm open on renaming it. I 
also added property flags in case people would like to use a conf file to 
provide dependencies, which is possible if there is a long list of dependencies 
or exclusions.
    
    cc andrewor14 vanzin pwendell
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #7599 from brkyvz/packages-exclusions and squashes the following 
commits:
    
    636f410 [Burak Yavuz] addressed nits
    6e54ede [Burak Yavuz] is this the culprit
    b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
packages-exclusions
    154f5db [Burak Yavuz] addressed initial comments
    1536d7a [Burak Yavuz] Added flags to exclude packages using 
--packages-exclude
    
    (cherry picked from commit 1633d0a2612d94151f620c919425026150e69ae1)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit 93076ae39b58ba8c4a459f2b3a8590c492dc5c4e
Author: CodingCat <[email protected]>
Date:   2015-08-04T01:20:40Z

    [SPARK-8416] highlight and topping the executor threads in thread dumping 
page
    
    https://issues.apache.org/jira/browse/SPARK-8416
    
    To facilitate debugging, I made this patch with three changes:
    
    * render the executor-thread and non executor-thread entries with different 
background colors
    
    * put the executor threads on the top of the list
    
    * sort the threads alphabetically
    
    Author: CodingCat <[email protected]>
    
    Closes #7808 from CodingCat/SPARK-8416 and squashes the following commits:
    
    34fc708 [CodingCat] fix className
    d7b79dd [CodingCat] lowercase threadName
    d032882 [CodingCat] sort alphabetically and change the css class name
    f0513b1 [CodingCat] change the color & group threads by name
    2da6e06 [CodingCat] small fix
    3fc9f36 [CodingCat] define classes in webui.css
    8ee125e [CodingCat] highlight and put on top the executor threads in thread 
dumping page
    
    (cherry picked from commit 3b0e44490aebfba30afc147e4a34a63439d985c6)
    Signed-off-by: Josh Rosen <[email protected]>

commit ebe42b98c8fa0cac6ec267e895402cebe8a670a9
Author: Reynold Xin <[email protected]>
Date:   2015-08-04T01:47:02Z

    [SPARK-9577][SQL] Surface concrete iterator types in various sort classes.
    
    We often return abstract iterator types in various sort-related classes 
(e.g. UnsafeKVExternalSorter). It is actually better to return a more concrete 
type, so the callsite uses that type and JIT can inline the iterator calls.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #7911 from rxin/surface-concrete-type and squashes the following 
commits:
    
    0422add [Reynold Xin] [SPARK-9577][SQL] Surface concrete iterator types in 
various sort classes.
    
    (cherry picked from commit 5eb89f67e323dcf9fa3d5b30f9b5cb8f10ca1e8c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 1f7dbcd6fdeee22c7b670ea98dcb4e794f84a8cd
Author: Sean Owen <[email protected]>
Date:   2015-08-04T04:48:22Z

    [SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build
    
    Follow on for #7852: Building Spark doc needs to refer to new Maven 
requirement too
    
    Author: Sean Owen <[email protected]>
    
    Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits:
    
    73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer 
to new Maven requirement too
    
    (cherry picked from commit 0afa6fbf525723e97c6dacfdba3ad1762637ffa9)
    Signed-off-by: Sean Owen <[email protected]>

commit 29f2d5a065254e7ed44fb204a1deecf9d44d338c
Author: Ankur Dave <[email protected]>
Date:   2015-08-04T06:07:32Z

    [SPARK-3190] [GRAPHX] Fix VertexRDD.count() overflow regression
    
    SPARK-3190 was originally fixed by 
96df92906978c5f58e0cc8ff5eebe5b35a08be3b, but 
a5ef58113667ff73562ce6db381cff96a0b354b0 introduced a regression during 
refactoring. This commit fixes the regression.
    
    Author: Ankur Dave <[email protected]>
    
    Closes #7923 from ankurdave/SPARK-3190-reopening and squashes the following 
commits:
    
    a3e1b23 [Ankur Dave] Fix VertexRDD.count() overflow regression
    
    (cherry picked from commit 9e952ecbce670e9b532a1c664a4d03b66e404112)
    Signed-off-by: Reynold Xin <[email protected]>

commit 5ae675360d883483e509788b8867c1c98b4820fd
Author: Sean Owen <[email protected]>
Date:   2015-08-04T11:02:26Z

    [SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of 
build warnings, 1.5.0 edition
    
    Enable most javac lint warnings; fix a lot of build warnings. In a few 
cases, touch up surrounding code in the process.
    
    I'll explain several of the changes inline in comments.
    
    Author: Sean Owen <[email protected]>
    
    Closes #7862 from srowen/SPARK-9534 and squashes the following commits:
    
    ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build 
warnings. In a few cases, touch up surrounding code in the process.
    
    (cherry picked from commit 76d74090d60f74412bd45487e8db6aff2e8343a2)
    Signed-off-by: Sean Owen <[email protected]>

commit bd9b7521343c34c42be40ee05a01c8a976ed2307
Author: tedyu <[email protected]>
Date:   2015-08-04T11:22:53Z

    [SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was 
accidentally reverted
    
    This PR removes the dependency reduced POM hack brought back by #7191
    
    Author: tedyu <[email protected]>
    
    Closes #7919 from tedyu/master and squashes the following commits:
    
    1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack
    
    (cherry picked from commit b211cbc7369af5eb2cb65d93c4c57c4db7143f47)
    Signed-off-by: Sean Owen <[email protected]>

commit 45c8d2bb872bb905a402cf3aa78b1c4efaac07cf
Author: Carson Wang <[email protected]>
Date:   2015-08-04T13:12:30Z

    [SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page
    
    Add pagination for the RDD page to avoid unresponsive UI when the number of 
the RDD partitions is large.
    Before:
    
![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png)
    After:
    
![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png)
    
    Author: Carson Wang <[email protected]>
    
    Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:
    
    03c7168 [Carson Wang] Fix style issues
    612c18c [Carson Wang] RDD partition table pagination for the RDD Page
    
    (cherry picked from commit cb7fa0aa93dae5a25a8e7e387dbd6b55a5a23fb0)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit f44b27a2b92da2325ed9389cd27b6e2cfd9ec486
Author: Marcelo Vanzin <[email protected]>
Date:   2015-08-04T13:19:11Z

    [SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.
    
    This allows build/mvn to be used by make-distribution.sh.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #7915 from vanzin/SPARK-9583 and squashes the following commits:
    
    6469e60 [Marcelo Vanzin] [SPARK-9583] [build] Do not print mvn debug 
messages to stdout.
    
    (cherry picked from commit d702d53732b44e8242448ce5302738bd130717d8)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit 945da3534762a73fe7ffc52c868ff07a0783502b
Author: Tarek Auel <[email protected]>
Date:   2015-08-04T15:59:42Z

    [SPARK-8244] [SQL] string function: find in set
    
    This PR is based on #7186 (just fix the conflict), thanks to tarekauel .
    
    find_in_set(string str, string strList): int
    
    Returns the first occurance of str in strList where strList is a 
comma-delimited string. Returns null if either argument is null. Returns 0 if 
the first argument contains any commas. For example, find_in_set('ab', 
'abc,b,ab,c,def') returns 3.
    
    Only add this to SQL, not DataFrame.
    
    Closes #7186
    
    Author: Tarek Auel <[email protected]>
    Author: Davies Liu <[email protected]>
    
    Closes #7900 from davies/find_in_set and squashes the following commits:
    
    4334209 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
    8f00572 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
    243ede4 [Tarek Auel] [SPARK-8244][SQL] hive compatibility
    1aaf64e [Tarek Auel] [SPARK-8244][SQL] unit test fix
    e4093a4 [Tarek Auel] [SPARK-8244][SQL] final modifier for COMMA_UTF8
    0d05df5 [Tarek Auel] Merge branch 'master' into SPARK-8244
    208d710 [Tarek Auel] [SPARK-8244] address comments & bug fix
    71b2e69 [Tarek Auel] [SPARK-8244] find_in_set
    66c7fda [Tarek Auel] Merge branch 'master' into SPARK-8244
    61b8ca2 [Tarek Auel] [SPARK-8224] removed loop and split; use unsafe String 
comparison
    4f75a65 [Tarek Auel] Merge branch 'master' into SPARK-8244
    e3b20c8 [Tarek Auel] [SPARK-8244] added type check
    1c2bbb7 [Tarek Auel] [SPARK-8244] findInSet

commit b42e13dca38c6e9ff9cf879bcb52efa681437120
Author: Davies Liu <[email protected]>
Date:   2015-08-04T16:07:09Z

    [SPARK-8246] [SQL] Implement get_json_object
    
    This is based on #7485 , thanks to NathanHowell
    
    Tests were copied from Hive, but do not seem to be super comprehensive. 
I've generally replicated Hive's unusual behavior rather than following a 
JSONPath reference, except for one case (as noted in the comments). I don't 
know if there is a way of fully replicating Hive's behavior without a slower 
TreeNode implementation, so I've erred on the side of performance instead.
    
    Author: Davies Liu <[email protected]>
    Author: Yin Huai <[email protected]>
    Author: Nathan Howell <[email protected]>
    
    Closes #7901 from davies/get_json_object and squashes the following commits:
    
    3ace9b9 [Davies Liu] Merge branch 'get_json_object' of 
github.com:davies/spark into get_json_object
    98766fc [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
    a7dc6d0 [Davies Liu] Update JsonExpressionsSuite.scala
    c818519 [Yin Huai] new results.
    18ce26b [Davies Liu] fix tests
    6ac29fb [Yin Huai] Golden files.
    25eebef [Davies Liu] use HiveQuerySuite
    e0ac6ec [Yin Huai] Golden answer files.
    940c060 [Davies Liu] tweat code style
    44084c5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
    9192d09 [Nathan Howell] Match Hiveâs behavior for unwrapping arrays of 
one element
    8dab647 [Nathan Howell] [SPARK-8246] [SQL] Implement get_json_object
    
    (cherry picked from commit 73dedb589d06f7c7a525cc4f07721a77f480c434)
    Signed-off-by: Davies Liu <[email protected]>

commit d875368edd7265cedf808c921c0af0deb4895a67
Author: Yijie Shen <[email protected]>
Date:   2015-08-04T16:09:52Z

    [SPARK-9541] [SQL] DataTimeUtils cleanup
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-9541
    
    Author: Yijie Shen <[email protected]>
    
    Closes #7870 from yjshen/datetime_cleanup and squashes the following 
commits:
    
    9203e33 [Yijie Shen] revert getMonth & getDayOfMonth
    5cad119 [Yijie Shen] rebase code
    7d62a74 [Yijie Shen] remove tmp tuple inside split date
    e98aaac [Yijie Shen] DataTimeUtils cleanup
    
    (cherry picked from commit b5034c9c59947f20423faa46bc6606aad56836b0)
    Signed-off-by: Davies Liu <[email protected]>

commit aa8390dfcbb45eeff3d5894cf9b2edbd245b7320
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-08-04T16:40:07Z

    [SPARK-9562] Change reference to amplab/spark-ec2 from mesos/
    
    cc srowen pwendell nchammas
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #7899 from shivaram/spark-ec2-move and squashes the following 
commits:
    
    7cc22c9 [Shivaram Venkataraman] Change reference to amplab/spark-ec2 from 
mesos/
    
    (cherry picked from commit 6a0f8b994de36b7a7bdfb9958d39dbd011776107)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit a9277cd5aedd570f550e2a807768c8ffada9576f
Author: Michael Armbrust <[email protected]>
Date:   2015-08-04T17:07:53Z

    [SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting
    
    The analysis rule has a bug and we ended up making the sorter still capable 
of doing evaluation, so lets revert this for now.
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #7906 from marmbrus/revertSortProjection and squashes the following 
commits:
    
    2da6972 [Michael Armbrust] unrevert unrelated changes
    4f2b00c [Michael Armbrust] Revert "[SPARK-9251][SQL] do not order by 
expressions which still need evaluation"
    
    (cherry picked from commit 34a0eb2e89d59b0823efc035ddf2dc93f19540c1)
    Signed-off-by: Michael Armbrust <[email protected]>

commit c5250ddc5242a071549e980f69fa8bd785168979
Author: Holden Karau <[email protected]>
Date:   2015-08-04T17:12:22Z

    [SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier
    
    This PR replaces the old "threshold" with a generalized "thresholds" Param. 
 We keep getThreshold,setThreshold for backwards compatibility for binary 
classification.
    
    Note that the primary author of this PR is holdenk
    
    Author: Holden Karau <[email protected]>
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #7909 from 
jkbradley/holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest and 
squashes the following commits:
    
    3952977 [Joseph K. Bradley] fixed pyspark doc test
    85febc8 [Joseph K. Bradley] made python unit tests a little more robust
    7eb1d86 [Joseph K. Bradley] small cleanups
    6cc2ed8 [Joseph K. Bradley] Fixed remaining merge issues.
    0255e44 [Joseph K. Bradley] Many cleanups for thresholds, some more tests
    7565a60 [Holden Karau] fix pep8 style checks, add a getThreshold method 
similar to our LogisticRegression.scala one for API compat
    be87f26 [Holden Karau] Convert threshold to thresholds in the python code, 
add specialized support for Array[Double] to shared parems codegen, etc.
    6747dad [Holden Karau] Override raw2prediction for ProbabilisticClassifier, 
fix some tests
    25df168 [Holden Karau] Fix handling of thresholds in LogisticRegression
    c02d6c0 [Holden Karau] No default for thresholds
    5e43628 [Holden Karau] CR feedback and fixed the renamed test
    f3fbbd1 [Holden Karau] revert the changes to random forest :(
    51f581c [Holden Karau] Add explicit types to public methods, fix long line
    f7032eb [Holden Karau] Fix a java test bug, remove some unecessary changes
    adf15b4 [Holden Karau] rename the classifier suite test to 
ProbabilisticClassifierSuite now that we only have it in Probabilistic
    398078a [Holden Karau] move the thresholding around a bunch based on the 
design doc
    4893bdc [Holden Karau] Use numtrees of 3 since previous result was tied 
(one tree for each) and the switch from different max methods picked a 
different element (since they were equal I think this is ok)
    638854c [Holden Karau] Add a scala RandomForestClassifierSuite test based 
on corresponding python test
    e09919c [Holden Karau] Fix return type, I need more coffee....
    8d92cac [Holden Karau] Use ClassifierParams as the head
    3456ed3 [Holden Karau] Add explicit return types even though just test
    a0f3b0c [Holden Karau] scala style fixes
    6f14314 [Holden Karau] Since hasthreshold/hasthresholds is in root 
classifier now
    ffc8dab [Holden Karau] Update the sharedParams
    0420290 [Holden Karau] Allow us to override the get methods selectively
    978e77a [Holden Karau] Move HasThreshold into classifier params and start 
defining the overloaded getThreshold/getThresholds functions
    1433e52 [Holden Karau] Revert "try and hide threshold but chainges the API 
so no dice there"
    1f09a2e [Holden Karau] try and hide threshold but chainges the API so no 
dice there
    efb9084 [Holden Karau] move setThresholds only to where its used
    6b34809 [Holden Karau] Add a test with thresholding for the RFCS
    74f54c3 [Holden Karau] Fix creation of vote array
    1986fa8 [Holden Karau] Setting the thresholds only makes sense if the 
underlying class hasn't overridden predict, so lets push it down.
    2f44b18 [Holden Karau] Add a global default of null for thresholds param
    f338cfc [Holden Karau] Wait that wasn't a good idea, Revert "Some progress 
towards unifying threshold and thresholds"
    634b06f [Holden Karau] Some progress towards unifying threshold and 
thresholds
    85c9e01 [Holden Karau] Test passes again... little fnur
    099c0f3 [Holden Karau] Move thresholds around some more (set on model not 
trainer)
    0f46836 [Holden Karau] Start adding a classifiersuite
    f70eb5e [Holden Karau] Fix test compile issues
    a7d59c8 [Holden Karau] Move thresholding into Classifier trait
    5d999d2 [Holden Karau] Some more progress, start adding a test (maybe try 
and see if we can find a better thing to use for the base of the test)
    1fed644 [Holden Karau] Use thresholds to scale scores in random forest 
classifcation
    31d6bf2 [Holden Karau] Start threading the threshold info through
    0ef228c [Holden Karau] Add hasthresholds
    
    (cherry picked from commit 5a23213c148bfe362514f9c71f5273ebda0a848a)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit be37b1bd3edd8583180dc1a41ecf4d80990216c7
Author: Michael Armbrust <[email protected]>
Date:   2015-08-04T19:19:52Z

    [SPARK-9606] [SQL] Ignore flaky thrift server tests
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #7939 from marmbrus/turnOffThriftTests and squashes the following 
commits:
    
    80d618e [Michael Armbrust] [SPARK-9606][SQL] Ignore flaky thrift server 
tests
    
    (cherry picked from commit a0cc01759b0c2cecf340c885d391976eb4e3fad6)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 43f6b021e5f14b9126e4291f989a076085367c2c
Author: Wenchen Fan <[email protected]>
Date:   2015-08-04T21:40:46Z

    [SPARK-9553][SQL] remove the no-longer-necessary createCode and 
createStructCode, and replace the usage
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #7890 from cloud-fan/minor and squashes the following commits:
    
    c3b1be3 [Wenchen Fan] fix style
    b0cbe2e [Wenchen Fan] remove the createCode and createStructCode, and 
replace the usage of them by createStructCode
    
    (cherry picked from commit f4b1ac08a1327e6d0ddc317cdf3997a0f68dec72)
    Signed-off-by: Reynold Xin <[email protected]>

commit f771a83f4090e979f72d01989e6693d7fbc05c05
Author: Josh Rosen <[email protected]>
Date:   2015-08-04T21:42:11Z

    [SPARK-9452] [SQL] Support records larger than page size in 
UnsafeExternalSorter
    
    This patch extends UnsafeExternalSorter to support records larger than the 
page size. The basic strategy is the same as in #7762: store large records in 
their own overflow pages.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #7891 from JoshRosen/large-records-in-sql-sorter and squashes the 
following commits:
    
    967580b [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
    948c344 [Josh Rosen] Add large records tests for KV sorter.
    3c17288 [Josh Rosen] Combine memory and disk cleanup into general 
cleanupResources() method
    380f217 [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
    27eafa0 [Josh Rosen] Fix page size in PackedRecordPointerSuite
    a49baef [Josh Rosen] Address initial round of review comments
    3edb931 [Josh Rosen] Remove accidentally-committed debug statements.
    2b164e2 [Josh Rosen] Support large records in UnsafeExternalSorter.
    
    (cherry picked from commit ab8ee1a3b93286a62949569615086ef5030e9fae)
    Signed-off-by: Reynold Xin <[email protected]>

commit 560b2da783bc25bd8767f6888665dadecac916d8
Author: CodingCat <[email protected]>
Date:   2015-08-04T21:54:11Z

    [SPARK-9602] remove "Akka/Actor" words from comments
    
    https://issues.apache.org/jira/browse/SPARK-9602
    
    Although we have hidden Akka behind RPC interface, I found that the 
Akka/Actor-related comments are still spreading everywhere. To make it 
consistent, we shall remove "actor"/"akka" words from the comments...
    
    Author: CodingCat <[email protected]>
    
    Closes #7936 from CodingCat/SPARK-9602 and squashes the following commits:
    
    e8296a3 [CodingCat] remove actor words from comments
    
    (cherry picked from commit 9d668b73687e697cad2ef7fd3c3ba405e9795593)
    Signed-off-by: Reynold Xin <[email protected]>

commit e682ee25477374737f3b1dfc08c98829564b26d4
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-04T21:54:26Z

    [SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier
    
    Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier, 
plus doc tests for those columns.
    
    CC: holdenk yanboliang
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #7903 from jkbradley/rf-prob-python and squashes the following 
commits:
    
    c62a83f [Joseph K. Bradley] made unit test more robust
    14eeba2 [Joseph K. Bradley] added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier in PySpark
    
    (cherry picked from commit e375456063617cd7000d796024f41e5927f21edd)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit fe4a4f41ad8b686455d58fc2fda9494e8dba5636
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-04T22:43:13Z

    [SPARK-9582] [ML] LDA cleanups
    
    Small cleanups to recent LDA additions and docs.
    
    CC: feynmanliang
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #7916 from jkbradley/lda-cleanups and squashes the following commits:
    
    f7021d9 [Joseph K. Bradley] broadcasting large matrices for LDA in local 
model and online learning
    97947aa [Joseph K. Bradley] a few more cleanups
    5b03f88 [Joseph K. Bradley] reverted split of lda log likelihood
    c566915 [Joseph K. Bradley] small edit to make review easier
    63f6c7d [Joseph K. Bradley] clarified log likelihood for lda models
    
    (cherry picked from commit 1833d9c08f021d991334424d0a6d5ec21d1fccb2)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit f4e125acf36023425722abb0fb74be63a425aa7b
Author: Mike Dusenberry <[email protected]>
Date:   2015-08-04T23:30:03Z

    [SPARK-6485] [MLLIB] [PYTHON] Add 
CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark.
    
    This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix 
distributed matrices to PySpark.  Each distributed matrix class acts as a 
wrapper around the Scala/Java counterpart by maintaining a reference to the 
Java object.  New distributed matrices can be created using factory methods 
added to DistributedMatrices, which creates the Java distributed matrix and 
then wraps it with the corresponding PySpark class.  This design allows for 
simple conversion between the various distributed matrices, and lets us re-use 
the Scala code.  Serialization between Python and Java is implemented using 
DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity.  
Associated documentation and unit-tests have also been added.  To facilitate 
code review, this PR implements access to the rows/entries as RDDs, the number 
of rows & columns, and conversions between the various distributed matrices 
(not including BlockMatrix), and does not implement the other linear algebra 
funct
 ions of the matrices, although this will be very simple to add now.
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #7554 from 
dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark 
and squashes the following commits:
    
    bb039cb [Mike Dusenberry] Minor documentation update.
    b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to 
make it even cleaner.  Now, we allow the 'rows' parameter in the constructors 
to be either an RDD or the Java matrix object. If 'rows' is an RDD, we create a 
Java matrix object, wrap it, and then store that.  If 'rows' is a Java matrix 
object of the correct type, we just wrap and store that directly.  This is only 
for internal usage, and publicly, we still require 'rows' to be an RDD.  We no 
longer store the 'rows' RDD, and instead just compute it from the Java object 
when needed.  The point of this is that when we do matrix conversions, we do 
the conversion on the Scala/Java side, which returns a Java object, so we 
should use that directly, but exposing 'java_matrix' parameter in the public 
API is not ideal. This non-public feature of allowing 'rows' to be a Java 
matrix object is documented in the '__init__' constructor docstrings, which are 
not part of the generated public API, and doctests are also include
 d.
    7f0dcb6 [Mike Dusenberry] Updating module docstring.
    cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' 
rather than 'SQLContext.getOrCreate', as the later doesn't guarantee that the 
SparkContext will be the same as for the matrix.rows data.
    687e345 [Mike Dusenberry] Improving conversion performance.  This adds an 
optional 'java_matrix' parameter to the constructors, and pulls the conversion 
logic out into a '_create_from_java' function. Now, if the constructors are 
given a valid Java distributed matrix object as 'java_matrix', they will store 
those internally, rather than create a new one on the Scala/Java side.
    3e50b6e [Mike Dusenberry] Moving the distributed matrices to 
pyspark.mllib.linalg.distributed.
    308f197 [Mike Dusenberry] Using properties for better documentation.
    1633f86 [Mike Dusenberry] Minor documentation cleanup.
    f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from 
DistributedMatrix.
    ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner.
    3fd4016 [Mike Dusenberry] Updating docstrings.
    27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors 
for each distributed matrix.
    a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using 
lists instead of DenseVectors explicitly.
    d19b0ba [Mike Dusenberry] Updating code and documentation to note that a 
vector-like object (numpy array, list, etc.) can be used in place of explicit 
Vector object, and adding conversions when necessary to RowMatrix construction.
    4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and 
MatrixEntry.
    c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow 
or MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix 
constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry 
functions.
    329638b [Mike Dusenberry] Moving the Experimental tag to the top of each 
docstring.
    0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated 
rows/entries RDDs within the various tests.
    c0900df [Mike Dusenberry] Adding the colons that were accidentally not 
inserted.
    4ad6819 [Mike Dusenberry] Documenting the  and  parameters.
    3b854b9 [Mike Dusenberry] Minor updates to documentation.
    10046e8 [Mike Dusenberry] Updating documentation to use class constructors 
instead of the removed DistributedMatrices factory methods.
    119018d [Mike Dusenberry] Adding static  methods to each of the distributed 
matrix classes to consolidate conversion logic.
    4d7af86 [Mike Dusenberry] Adding type checks to the constructors.  Although 
it is slightly verbose, it is better for the user to have a good error message 
than a cryptic stacktrace.
    93b6a3d [Mike Dusenberry] Pulling the DistributedMatrices Python class out 
of this pull request.
    f6f3c68 [Mike Dusenberry] Pulling the DistributedMatrices Scala class out 
of this pull request.
    6a3ecb7 [Mike Dusenberry] Updating pattern matching.
    08f287b [Mike Dusenberry] Slight reformatting of the documentation.
    a245dc0 [Mike Dusenberry] Updating Python doctests for compatability 
between Python 2 & 3. Since Python 3 removed the idea of a separate 'long' 
type, all values that would have been outputted as a 'long' (ex: '4L') will now 
be treated as an 'int' and outputed as one (ex: '4').  The doctests now 
explicitly convert to ints so that both Python 2 and 3 will have the same 
output.  This is fine since the values are all small, and thus can be easily 
represented as ints.
    4d3a37e [Mike Dusenberry] Reformatting a few long Python doctest lines.
    7e3ca16 [Mike Dusenberry] Fixing long lines.
    f721ead [Mike Dusenberry] Updating documentation for each of the 
distributed matrices.
    ab0e8b6 [Mike Dusenberry] Updating unit test to be more useful.
    dda2f89 [Mike Dusenberry] Added wrappers for the conversions between the 
various distributed matrices.  Added logic to be able to access the 
rows/entries of the distributed matrices, which requires serialization through 
DataFrames for IndexedRowMatrix and CoordinateMatrix types. Added unit tests.
    0cd7166 [Mike Dusenberry] Implemented the CoordinateMatrix API in PySpark, 
following the idea of the IndexedRowMatrix API, including using DataFrames for 
serialization.
    3c369cb [Mike Dusenberry] Updating the architecture a bit to make 
conversions between the various distributed matrix types easier.  The different 
distributed matrix classes are now only wrappers around the Java objects, and 
take the Java object as an argument during construction.  This way, we can call 
 for example on an , which returns a reference to a Java RowMatrix object, and 
then construct a PySpark RowMatrix object wrapped around the Java object.  This 
is analogous to the behavior of PySpark RDDs and DataFrames.  We now delegate 
creation of the various distributed matrices from scratch in PySpark to the 
factory methods on .
    4bdd09b [Mike Dusenberry] Implemented the IndexedRowMatrix API in PySpark, 
following the idea of the RowMatrix API.  Note that for the IndexedRowMatrix, 
we use DataFrames to serialize the data between Python and Scala/Java, so we 
accept PySpark RDDs, then convert to a DataFrame, then convert back to RDDs on 
the Scala/Java side before constructing the IndexedRowMatrix.
    23bf1ec [Mike Dusenberry] Updating documentation to add PySpark RowMatrix. 
Inserting newline above doctest so that it renders properly in API docs.
    b194623 [Mike Dusenberry] Updating design to have a PySpark RowMatrix 
simply create and keep a reference to a wrapper over a Java RowMatrix.  
Updating DistributedMatrices factory methods to accept numRows and numCols with 
default values.  Updating PySpark DistributedMatrices factory method to simply 
create a PySpark RowMatrix. Adding additional doctests for numRows and numCols 
parameters.
    bc2d220 [Mike Dusenberry] Adding unit tests for RowMatrix methods.
    d7e316f [Mike Dusenberry] Implemented the RowMatrix API in PySpark by doing 
the following: Added a DistributedMatrices class to contain factory methods for 
creating the various distributed matrices.  Added a factory method for creating 
a RowMatrix from an RDD of Vectors.  Added a createRowMatrix function to the 
PythonMLlibAPI to interface with the factory method.  Added DistributedMatrix, 
DistributedMatrices, and RowMatrix classes to the pyspark.mllib.linalg api.
    
    (cherry picked from commit 571d5b5363ff4dbbce1f7019ab8e86cbc3cba4d5)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit cff0fe291aa470ef5cf4e5087c7114fb6360572f
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-04T23:52:43Z

    [SPARK-9586] [ML] Update BinaryClassificationEvaluator to use 
setRawPredictionCol
    
    Update BinaryClassificationEvaluator to use setRawPredictionCol, rather 
than setScoreCol. Deprecated setScoreCol.
    
    I don't think setScoreCol was actually used anywhere (based on search).
    
    CC: mengxr
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #7921 from jkbradley/binary-eval-rawpred and squashes the following 
commits:
    
    e5d7dfa [Joseph K. Bradley] Update BinaryClassificationEvaluator to use 
setRawPredictionCol
    
    (cherry picked from commit b77d3b9688d56d33737909375d1d0db07da5827b)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 1954a7bb175122b776870530217159cad366ca6c
Author: Wenchen Fan <[email protected]>
Date:   2015-08-05T00:05:19Z

    [SPARK-9598][SQL] do not expose generic getter in internal row
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #7932 from cloud-fan/generic-getter and squashes the following 
commits:
    
    c60de4c [Wenchen Fan] do not expose generic getter in internal row
    
    (cherry picked from commit 7c8fc1f7cb837ff5c32811fdeb3ee2b84de2dea4)
    Signed-off-by: Reynold Xin <[email protected]>

commit 33509754843fe8eba303c720e6c0f6853b861e7e
Author: Feynman Liang <[email protected]>
Date:   2015-08-05T01:13:18Z

    [SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy
    
    jkbradley
    
    Author: Feynman Liang <[email protected]>
    
    Closes #7941 from feynmanliang/SPARK-9609-stategy-spelling and squashes the 
following commits:
    
    d2aafb1 [Feynman Liang] Add deprecated backwards compatibility
    aa090a8 [Feynman Liang] Fix spelling
    
    (cherry picked from commit 629e26f7ee916e70f59b017cb6083aa441b26b2c)
    Signed-off-by: Joseph K. Bradley <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: 更新最新代� �

Reply via email to