[GitHub] spark pull request: Branch 1.3

wesleydias Wed, 01 Apr 2015 00:49:19 -0700

GitHub user wesleydias opened a pull request:

    https://github.com/apache/spark/pull/5308


    Branch 1.3

    [SPARK-6639] Create a new script to start multiple masters

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5308
    
----
commit bd49e8b962b397b8fb8b22f980739021cf1a195e
Author: Sean Owen <[email protected]>
Date:   2015-02-19T23:35:23Z

    SPARK-4682 [CORE] Consolidate various 'Clock' classes
    
    Another one from JoshRosen 's wish list. The first commit is much smaller 
and removes 2 of the 4 Clock classes. The second is much larger, necessary for 
consolidating the streaming one. I put together implementations in the way that 
seemed simplest. Almost all the change is standardizing class and method names.
    
    Author: Sean Owen <[email protected]>
    
    Closes #4514 from srowen/SPARK-4682 and squashes the following commits:
    
    5ed3a03 [Sean Owen] Javadoc Clock classes; make ManualClock private[spark]
    169dd13 [Sean Owen] Add support for legacy org.apache.spark.streaming clock 
class names
    277785a [Sean Owen] Reduce the net change in this patch by reversing some 
unnecessary syntax changes along the way
    b5e53df [Sean Owen] FakeClock -> ManualClock; getTime() -> getTimeMillis()
    160863a [Sean Owen] Consolidate Streaming Clock class into common util Clock
    7c956b2 [Sean Owen] Consolidate Clocks except for Streaming Clock
    
    (cherry picked from commit 34b7c35380c88569a1396fb4ed991a0bed4288e7)
    Signed-off-by: Andrew Or <[email protected]>

commit c5f3b9e02f8b1d1b09d4309df9a2c8633da82910
Author: Ilya Ganelin <[email protected]>
Date:   2015-02-19T23:50:58Z

    SPARK-5570: No docs stating that `new 
SparkConf().set("spark.driver.memory", ...) will not work
    
    I've updated documentation to reflect true behavior of this setting in 
client vs. cluster mode.
    
    Author: Ilya Ganelin <[email protected]>
    
    Closes #4665 from ilganeli/SPARK-5570 and squashes the following commits:
    
    5d1c8dd [Ilya Ganelin] Added example configuration code
    a51700a [Ilya Ganelin] Getting rid of extra spaces
    85f7a08 [Ilya Ganelin] Reworded note
    5889d43 [Ilya Ganelin] Formatting adjustment
    f149ba1 [Ilya Ganelin] Minor updates
    1fec7a5 [Ilya Ganelin] Updated to add clarification for other driver 
properties
    db47595 [Ilya Ganelin] Slight formatting update
    c899564 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-5570
    17b751d [Ilya Ganelin] Updated documentation for driver-memory to reflect 
its true behavior in client vs cluster mode
    
    (cherry picked from commit 6bddc40353057a562c78e75c5549c79a0d7d5f8b)
    Signed-off-by: Andrew Or <[email protected]>

commit ba941ceb1f78b28ca5cfb18c770f4171b9c74b0a
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-20T02:06:16Z

    [SPARK-5900][MLLIB] make PIC and FPGrowth Java-friendly
    
    In the previous version, PIC stores clustering assignments as an 
`RDD[(Long, Int)]`. This is mapped to `RDD<Tuple2<Object, Object>>` in Java and 
hence Java users have to cast types manually. We should either create a new 
method called `javaAssignments` that returns `JavaRDD[(java.lang.Long, 
java.lang.Int)]` or wrap the result pair in a class. I chose the latter 
approach in this PR. Now assignments are stored as an `RDD[Assignment]`, where 
`Assignment` is a class with `id` and `cluster`.
    
    Similarly, in FPGrowth, the frequent itemsets are stored as an 
`RDD[(Array[Item], Long)]`, which is mapped to `RDD<Tuple2<Object, Object>>`. 
Though we provide a "Java-friendly" method `javaFreqItemsets` that returns 
`JavaRDD[(Array[Item], java.lang.Long)]`. It doesn't really work because 
`Array[Item]` is mapped to `Object` in Java. So in this PR I created a class 
`FreqItemset` to wrap the results. It has `items` and `freq`, as well as a 
`javaItems` method that returns `List<Item>` in Java.
    
    I'm not certain that the names I chose are proper: 
`Assignment`/`id`/`cluster` and `FreqItemset`/`items`/`freq`. Please let me 
know if there are better suggestions.
    
    CC: jkbradley
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4695 from mengxr/SPARK-5900 and squashes the following commits:
    
    865b5ca [Xiangrui Meng] make Assignment serializable
    cffa96e [Xiangrui Meng] fix test
    9c0e590 [Xiangrui Meng] remove unused Tuple2
    1b9db3d [Xiangrui Meng] make PIC and FPGrowth Java-friendly
    
    (cherry picked from commit 0cfd2cebde0b7fac3779eda80d6e42223f8a3d9f)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 0382dcc0a94f8e619fd11ec2cc0b18459a690c2b
Author: mcheah <[email protected]>
Date:   2015-02-20T02:09:22Z

    [SPARK-4808] Removing minimum number of elements read before spill check
    
    In the general case, Spillable's heuristic of checking for memory stress
    on every 32nd item after 1000 items are read is good enough. In general,
    we do not want to be enacting the spilling checks until later on in the
    job; checking for disk-spilling too early can produce unacceptable
    performance impact in trivial cases.
    
    However, there are non-trivial cases, particularly if each serialized
    object is large, where checking for the necessity to spill too late
    would allow the memory to overflow. Consider if every item is 1.5 MB in
    size, and the heap size is 1000 MB. Then clearly if we only try to spill
    the in-memory contents to disk after 1000 items are read, we would have
    already accumulated 1500 MB of RAM and overflowed the heap.
    
    Patch #3656 attempted to circumvent this by checking the need to spill
    on every single item read, but that would cause unacceptable performance
    in the general case. However, the convoluted cases above should not be
    forced to be refactored to shrink the data items. Therefore it makes
    sense that the memory spilling thresholds be configurable.
    
    Author: mcheah <[email protected]>
    
    Closes #4420 from mingyukim/memory-spill-configurable and squashes the 
following commits:
    
    6e2509f [mcheah] [SPARK-4808] Removing minimum number of elements read 
before spill check
    
    (cherry picked from commit 3be92cdac30cf488e09dbdaaa70e5c4cdaa9a099)
    Signed-off-by: Andrew Or <[email protected]>

commit 8c12f311444008fedc610d866f2535233027bced
Author: Joseph K. Bradley <[email protected]>
Date:   2015-02-20T10:31:32Z

    [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release
    
    For SPARK-5867:
    * The spark.ml programming guide needs to be updated to use the new SQL 
DataFrame API instead of the old SchemaRDD API.
    * It should also include Python examples now.
    
    For SPARK-5892:
    * Fix Python docs
    * Various other cleanups
    
    BTW, I accidentally merged this with master.  If you want to compile it on 
your own, use this branch which is based on spark/branch-1.3 and cherry-picks 
the commits from this PR: 
[https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
    
    CC: mengxr  (ML),  davies  (Python docs)
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #4675 from jkbradley/doc-review-1.3 and squashes the following 
commits:
    
    f191bb0 [Joseph K. Bradley] small cleanups
    e786efa [Joseph K. Bradley] small doc corrections
    6b1ab4a [Joseph K. Bradley] fixed python lint test
    946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  
Changed spark.ml Java examples to use DataFrames API instead of sql()
    da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' 
into doc-review-1.3
    629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page 
for old migration guides * small fixes * moved inherit_doc in python
    b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for 
string interpolation
    34b067f [Joseph K. Bradley] small doc correction
    da16aef [Joseph K. Bradley] Fixed python mllib docs
    8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
    695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class 
hierarchies in python docs
    a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
    b05a80d [Joseph K. Bradley] organize imports. doc cleanups
    e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
    
    (cherry picked from commit 4a17eedb16343413e5b6f8bb58c6da8952ee7ab6)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 913562ae7c3141b2d02419828b9e364867e85d85
Author: Davies Liu <[email protected]>
Date:   2015-02-20T23:35:05Z

    [SPARK-5898] [SPARK-5896] [SQL]  [PySpark] create DataFrame from pandas and 
tuple/list
    
    Fix createDataFrame() from pandas DataFrame (not tested by jenkins, depends 
on SPARK-5693).
    
    It also support to create DataFrame from plain tuple/list without column 
names, `_1`, `_2` will be used as column names.
    
    Author: Davies Liu <[email protected]>
    
    Closes #4679 from davies/pandas and squashes the following commits:
    
    c0cbe0b [Davies Liu] fix tests
    8466d1d [Davies Liu] fix create DataFrame from pandas
    
    (cherry picked from commit 5b0a42cb17b840c82d3f8a5ad061d99e261ceadf)
    Signed-off-by: Michael Armbrust <[email protected]>

commit b9a6c5c840be1cb4ec4c256920424afbe09c9b37
Author: Yin Huai <[email protected]>
Date:   2015-02-20T08:20:02Z

    [SPARK-5909][SQL] Add a clearCache command to Spark SQL's cache manager
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-5909
    
    Author: Yin Huai <[email protected]>
    
    Closes #4694 from yhuai/clearCache and squashes the following commits:
    
    397ecc4 [Yin Huai] Address comments.
    a2702fc [Yin Huai] Update parser.
    3a54506 [Yin Huai] add isEmpty to CacheManager.
    6d14460 [Yin Huai] Python clearCache.
    f7b8dbd [Yin Huai] Add clear cache command.

commit 932338edaf4bf4f92c9f488a4f03f255e580ca74
Author: Nishkam Ravi <[email protected]>
Date:   2015-02-21T17:59:28Z

    SPARK-5841 [CORE] [HOTFIX 2] Memory leak in DiskBlockManager
    
    Continue to see IllegalStateException in YARN cluster mode. Adding a simple 
workaround for now.
    
    Author: Nishkam Ravi <[email protected]>
    Author: nishkamravi2 <[email protected]>
    Author: nravi <[email protected]>
    
    Closes #4690 from nishkamravi2/master_nravi and squashes the following 
commits:
    
    d453197 [nishkamravi2] Update NewHadoopRDD.scala
    6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
    0ce2c32 [nishkamravi2] Update HadoopRDD.scala
    f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of 
https://github.com/nishkamravi2/spark into master_nravi
    ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of 
removeShutDownHook. Deletion of semi-redundant occurrences of expensive 
operation inShutDown.
    71d0e17 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    494d8c0 [nishkamravi2] Update DiskBlockManager.scala
    3c5ddba [nishkamravi2] Update DiskBlockManager.scala
    f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by 
recent changes to BlockManager.stop
    79ea8b4 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    b446edc [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
    535295a [nishkamravi2] Update TaskSetManager.scala
    3e1b616 [Nishkam Ravi] Modify test for maxResultSize
    9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message 
and add condition to check if maxResultSize > 0)
    5f8f9ed [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    636a9ff [nishkamravi2] Update YarnAllocator.scala
    8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
    35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
    5ac2ec1 [Nishkam Ravi] Remove out
    dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead 
issue
    42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
    362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
    c726bd9 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
    1cf2d1e [nishkamravi2] Update YarnAllocator.scala
    ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an 
additive constant to a multiplier (redone to resolve merge conflicts)
    2e69f11 [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark into master_nravi
    efd688a [Nishkam Ravi] Merge branch 'master' of 
https://github.com/apache/spark
    2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int 
value, to be consistent with rest of Spark
    3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
    5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
    eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
    df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, 
Hadoop-10456)
    6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
    5108700 [nravi] Fix in Spark for the Concurrent thread modification issue 
(SPARK-1097, HADOOP-10456)
    681b36f [nravi] Fix for SPARK-1758: failing test 
org.apache.spark.JavaAPISuite.wholeTextFiles
    
    (cherry picked from commit d3cbd38c33e6a2addcf8caa18eeb10036fbfd01b)
    Signed-off-by: Andrew Or <[email protected]>

commit 76e3e652763cf32f8ed7afe81a221bc8284af0ab
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-21T18:01:01Z

    [SPARK-5937][YARN] Fix ClientSuite to set YARN mode, so that the correct 
class is used in t...
    
    ...ests.
    
    Without this SparkHadoopUtil is used by the Client instead of 
YarnSparkHadoopUtil.
    
    Author: Hari Shreedharan <[email protected]>
    
    Closes #4711 from harishreedharan/SPARK-5937 and squashes the following 
commits:
    
    d154de6 [Hari Shreedharan] Use System.clearProperty() instead of setting 
the value of SPARK_YARN_MODE to empty string.
    f729f70 [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the 
correct class is used in tests.
    
    (cherry picked from commit 7138816abe1060a1e967c4c77c72d5752586d557)
    Signed-off-by: Andrew Or <[email protected]>

commit c5a5c6f618b89d712c13a236388fa67c136691ee
Author: Alexander <[email protected]>
Date:   2015-02-22T08:53:05Z

    [DOCS] Fix typo in API for custom InputFormats based on the ânewâ 
MapReduce API
    
    This looks like a simple typo ```SparkContext.newHadoopRDD``` instead of 
```SparkContext.newAPIHadoopRDD``` as in actual 
http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.SparkContext
    
    Author: Alexander <[email protected]>
    
    Closes #4718 from bzz/hadoop-InputFormats-doc-fix and squashes the 
following commits:
    
    680a4c4 [Alexander] Fix typo in docs on custom Hadoop InputFormats
    
    (cherry picked from commit a7f90390251ff62a0e10edf4c2eb876538597791)
    Signed-off-by: Sean Owen <[email protected]>

commit 04d3b328fc3747b39dab79a00e7799fa41857635
Author: Cheng Hao <[email protected]>
Date:   2015-02-22T08:56:30Z

    [DataFrame] [Typo] Fix the typo
    
    Author: Cheng Hao <[email protected]>
    
    Closes #4717 from chenghao-intel/typo1 and squashes the following commits:
    
    858d7b0 [Cheng Hao] update the typo
    
    (cherry picked from commit 275b1bef897d775f1f7743378ca3e09e36160136)
    Signed-off-by: Sean Owen <[email protected]>

commit eed7389cf80d0930da16f77d6ccb39a82fe976c2
Author: Sean Owen <[email protected]>
Date:   2015-02-22T09:09:06Z

    SPARK-5669 [BUILD] Reverse exclusion of JBLAS libs for 1.3
    
    CC mengxr
    
    Author: Sean Owen <[email protected]>
    
    Closes #4715 from srowen/SPARK-5669.3 and squashes the following commits:
    
    b27ffa9 [Sean Owen] Reverse exclusion of JBLAS libs for 1.3

commit 4186dd3dd074a41b3a1d6a4279b683fb355da092
Author: Andrew Or <[email protected]>
Date:   2015-02-22T17:44:52Z

    Revert "[SPARK-4808] Removing minimum number of elements read before spill 
check"
    
    This reverts commit 0382dcc0a94f8e619fd11ec2cc0b18459a690c2b.

commit f172387dd6e89d2f24dd507ce8327fb4a496f957
Author: Makoto Fukuhara <[email protected]>
Date:   2015-02-23T09:24:33Z

    [EXAMPLES] fix typo.
    
    Author: Makoto Fukuhara <[email protected]>
    
    Closes #4724 from fukuo33/fix-typo and squashes the following commits:
    
    8c806b9 [Makoto Fukuhara] fix typo.
    
    (cherry picked from commit 934876741683fc254fed18e7ff630614f78944be)
    Signed-off-by: Sean Owen <[email protected]>

commit 67b7f792908a3dd6b6453249f3ecf65dd51a6ba5
Author: Saisai Shao <[email protected]>
Date:   2015-02-23T11:27:27Z

    [SPARK-5943][Streaming] Update the test to use new API to reduce the warning
    
    Author: Saisai Shao <[email protected]>
    
    Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
    
    1b01233 [Saisai Shao] Update the test to use new API to reduce the warning
    
    (cherry picked from commit 757b14b862a1d39c1bad7b321dae1a3ea8338fbb)
    Signed-off-by: Sean Owen <[email protected]>

commit 33b908485ae79944ac0d888b444d528178af5dd0
Author: Jacky Li <[email protected]>
Date:   2015-02-23T16:47:28Z

    [SPARK-5939][MLLib] make FPGrowth example app take parameters
    
    Add parameter parsing in FPGrowth example app in Scala and Java
    And a sample data file is added in data/mllib folder
    
    Author: Jacky Li <[email protected]>
    
    Closes #4714 from jackylk/parameter and squashes the following commits:
    
    8c478b3 [Jacky Li] fix according to comments
    3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
    f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
    
    (cherry picked from commit 651a1c019eb911005e234a46cc559d63da352377)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 8355773c3ca04950f90345690690407981ad6a82
Author: Alexander Ulanov <[email protected]>
Date:   2015-02-23T20:09:40Z

    [MLLIB] SPARK-5912 Programming guide for feature selection
    
    Added description of ChiSqSelector and few words about feature selection in 
general. I could add a code example, however it would not look reasonable in 
the absence of feature discretizer or a dataset in the `data` folder that has 
redundant features.
    
    Author: Alexander Ulanov <[email protected]>
    
    Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
    
    19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
    58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
    eb6b9fe [Alexander Ulanov] Typo
    2921a1d [Alexander Ulanov] ChiSqSelector example of use
    c845350 [Alexander Ulanov] ChiSqSelector docs
    
    (cherry picked from commit 28ccf5ee769a1df019e38985112065c01724fbd9)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit ae9704010d2c7dff523d8d89ded3af27f414f8e6
Author: Joseph K. Bradley <[email protected]>
Date:   2015-02-24T00:15:57Z

    [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs
    
    Fixes:
    * typo in Scala example
    * Removed comment "usually applied on sparse data" since that is debatable
    * small edits to text for clarity
    
    CC: avulanov  I noticed a typo post-hoc and ended up making a few small 
edits.  Do the changes look OK?
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #4732 from jkbradley/chisqselector-docs and squashes the following 
commits:
    
    9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
    3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
    
    (cherry picked from commit 59536cc87e10e5011560556729dd901280958f43)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 33ccad20ef2a4501ea8a8c5983f88f256c3ed478
Author: Yin Huai <[email protected]>
Date:   2015-02-24T01:16:34Z

    [SPARK-5935][SQL] Accept MapType in the schema provided to a JSON dataset.
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-5935
    
    Author: Yin Huai <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #4710 from yhuai/jsonMapType and squashes the following commits:
    
    3e40390 [Yin Huai] Remove unnecessary changes.
    f8e6267 [Yin Huai] Fix test.
    baa36e3 [Yin Huai] Accept MapType in the schema provided to 
jsonFile/jsonRDD.
    
    (cherry picked from commit 48376bfe9c97bf31279918def6c6615849c88f4d)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 2d7786ed1e008b33e8b171a8f2ea30e19426ba1f
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T01:34:54Z

    [SPARK-5873][SQL] Allow viewing of partially analyzed plans in 
queryExecution
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4684 from marmbrus/explainAnalysis and squashes the following 
commits:
    
    afbaa19 [Michael Armbrust] fix python
    d93278c [Michael Armbrust] fix hive
    e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into explainAnalysis
    52119f2 [Michael Armbrust] more tests
    82a5431 [Michael Armbrust] fix tests
    25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' 
into explainAnalysis
    aee1e6a [Michael Armbrust] fix hive
    b23a844 [Michael Armbrust] newline
    de8dc51 [Michael Armbrust] more comments
    acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans 
in query execution
    
    (cherry picked from commit 1ed57086d402c38d95cda6c3d9d7aea806609bf9)
    Signed-off-by: Michael Armbrust <[email protected]>

commit dd42558504201f153689b02bdaea1e19b28d3c1f
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-24T06:08:44Z

    [SPARK-5958][MLLIB][DOC] update block matrix user guide
    
    * Removed SVD code from examples.
    * Corrected Java API doc link.
    * Updated variable names: `AtransposeA` -> `ata`.
    * Minor changes.
    
    brkyvz
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the 
following commits:
    
    70f53ac [Xiangrui Meng] update block matrix user guide
    
    (cherry picked from commit cf2e41653de778dc8db8b03385a053aae1152e19)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 2b562b04324229e43a48803fe9aff65ee4af21e8
Author: Cheng Lian <[email protected]>
Date:   2015-02-24T18:45:38Z

    [SPARK-5968] [SQL] Suppresses ParquetOutputCommitter WARN logs
    
    Please refer to the [JIRA ticket] [1] for the motivation.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-5968
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png"; height=40 alt="Review 
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #4744 from liancheng/spark-5968 and squashes the following commits:
    
    caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs
    
    (cherry picked from commit 840333133396d443e747f62fce9967f7681fb276)
    Signed-off-by: Michael Armbrust <[email protected]>

commit ba5d60dda770fe1c1f8034149495009ef65749e2
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T18:49:51Z

    [SPARK-5910][SQL] Support for as in selectExpr
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4736 from marmbrus/asExprs and squashes the following commits:
    
    5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
    
    (cherry picked from commit 0a59e45e2f2e6f00ccd5f10c79f629fb796fd8d0)
    Signed-off-by: Michael Armbrust <[email protected]>

commit e46096b1e9f173c9f65425c3980b6f32edb3bf99
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T18:52:18Z

    [SPARK-5532][SQL] Repartition should not use external rdd representation
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4738 from marmbrus/udtRepart and squashes the following commits:
    
    c06d7b5 [Michael Armbrust] fix compilation
    91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use 
external rdd representation
    
    (cherry picked from commit 201236628a344194f7c20ba8e9afeeaefbe9318c)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 28dd53b1b613ba010dd4402d0744d6ebdd422fb5
Author: Tathagata Das <[email protected]>
Date:   2015-02-24T19:02:47Z

    [Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds
    
    Patch should be self-explanatory
    pwendell JoshRosen
    
    Author: Tathagata Das <[email protected]>
    
    Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
    
    653b5bb [Tathagata Das] Fixed the fix and added test
    e2de972 [Tathagata Das] Clear stages which have no corresponding active 
jobs.
    
    (cherry picked from commit 64d2c01ff1048de83b9b8efce987b55e457298f9)
    Signed-off-by: Andrew Or <[email protected]>

commit eaf7bf98af71a281e37416f3b469f801df46729e
Author: Andrew Or <[email protected]>
Date:   2015-02-24T19:08:07Z

    [SPARK-5965] Standalone Worker UI displays {{USER_JAR}}
    
    For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
    This was caused by 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
    
    Author: Andrew Or <[email protected]>
    
    Closes #4739 from andrewor14/user-jar-blocker and squashes the following 
commits:
    
    23c4a9e [Andrew Or] Use right argument
    
    (cherry picked from commit 6d2caa576fcdc5c848d1472b09c685b3871e220e)
    Signed-off-by: Andrew Or <[email protected]>

commit a4ff445a9f2f34697d99a607b05cbc7322beec18
Author: Xiangrui Meng <[email protected]>
Date:   2015-02-24T19:38:59Z

    [MLLIB] Change x_i to y_i in Variance's user guide
    
    Variance is calculated on labels/responses.
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4740 from mengxr/patch-1 and squashes the following commits:
    
    673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
    
    (cherry picked from commit 105791e35cee694f3b2ac1e06758650fe44e2c71)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 641423dbffd9333ea8d989d0afa7b78426bd3979
Author: Michael Armbrust <[email protected]>
Date:   2015-02-24T21:39:29Z

    [SPARK-5952][SQL] Lock when using hive metastore client
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #4746 from marmbrus/hiveLock and squashes the following commits:
    
    8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore 
client
    
    (cherry picked from commit a2b9137923e0ba328da8fff2fbbfcf2abf50b033)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 91bf0f827efff406b56dd6d9443c3d203f03d7fe
Author: Davies Liu <[email protected]>
Date:   2015-02-24T22:50:00Z

    [SPARK-5973] [PySpark] fix zip with two RDDs with AutoBatchedSerializer
    
    Author: Davies Liu <[email protected]>
    
    Closes #4745 from davies/fix_zip and squashes the following commits:
    
    2124b2c [Davies Liu] Update tests.py
    b5c828f [Davies Liu] increase the number of records
    c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer
    
    (cherry picked from commit da505e59274d1c838653c1109db65ad374e65304)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 17ee2460a385d918e01b64aaf3fdb683b871ac36
Author: Cheng Lian <[email protected]>
Date:   2015-02-25T00:34:55Z

    [SPARK-5751] [SQL] [WIP] Revamped HiveThriftServer2Suite for robustness
    
    **NOTICE** Do NOT merge this, as we're waiting for #3881 to be merged.
    
    `HiveThriftServer2Suite` has been notorious for its flakiness for a while. 
This was mostly due to spawning and communicate with external server processes. 
This PR revamps this test suite for better robustness:
    
    1. Fixes a racing condition occurred while using `tail -f` to check log file
    
       It's possible that the line we are looking for has already been printed 
into the log file before we start the `tail -f` process. This PR uses `tail -n 
+0 -f` to ensure all lines are checked.
    
    2. Retries up to 3 times if the server fails to start
    
       In most of the cases, the server fails to start because of port 
conflict. This PR no longer asks the system to choose an available TCP port, 
but uses a random port first, and retries up to 3 times if the server fails to 
start.
    
    3. A server instance is reused among all test cases within a single suite
    
       The original `HiveThriftServer2Suite` is splitted into two test suites, 
`HiveThriftBinaryServerSuite` and `HiveThriftHttpServerSuite`. Each suite 
starts a `HiveThriftServer2` instance and reuses it for all of its test cases.
    
    **TODO**
    
    - [ ] Starts the Thrift server in foreground once #3881 is merged (adding 
`--foreground` flag to `spark-daemon.sh`)
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png"; height=40 alt="Review 
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4720)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #4720 from liancheng/revamp-thrift-server-tests and squashes the 
following commits:
    
    d6c80eb [Cheng Lian] Relaxes server startup timeout
    6f14eb1 [Cheng Lian] Revamped HiveThriftServer2Suite for robustness
    
    (cherry picked from commit f816e73902b8ca28e24bf1f79a70533f75f239db)
    Signed-off-by: Cheng Lian <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.3

Reply via email to