[GitHub] spark pull request: Branch 1.0

lowryact Tue, 25 Nov 2014 01:27:44 -0800

GitHub user lowryact opened a pull request:

    https://github.com/apache/spark/pull/3451


    Branch 1.0

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3451
    
----
commit 16e3910a0512cd53ad0c9c71ef20a3ee0f10c34f
Author: Matei Zaharia <ma...@databricks.com>
Date:   2014-06-06T06:01:48Z

    SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys
    
    The current implementation reads one key with the next hash code as it 
finishes reading the keys with the current hash code, which may cause it to 
miss some matches of the next key. This can cause operations like join to give 
the wrong result when reduce tasks spill to disk and there are hash collisions, 
as values won't be matched together. This PR fixes it by not reading in that 
next key, using a peeking iterator instead.
    
    Author: Matei Zaharia <ma...@databricks.com>
    
    Closes #986 from mateiz/spark-2043 and squashes the following commits:
    
    0959514 [Matei Zaharia] Added unit test for having many hash collisions
    892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash 
code in ExternalAppendOnlyMap, instead use a buffered iterator to only read 
values with the current hash code.
    
    (cherry picked from commit b45c13e7d798f97b92f1a6329528191b8d779c4f)
    Signed-off-by: Matei Zaharia <ma...@databricks.com>

commit d3717bea951888fe64cc2a0119d23b641b030735
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-06-06T06:20:59Z

    [SPARK-2050][SQL] LIKE, RLIKE and IN in HQL should not be case sensitive.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #989 from marmbrus/caseSensitiveFuncitons and squashes the following 
commits:
    
    681de54 [Michael Armbrust] LIKE, RLIKE and IN in HQL should not be case 
sensitive.
    
    (cherry picked from commit 41db44c428a10f4453462d002d226798bb8fbdda)
    Signed-off-by: Reynold Xin <r...@apache.org>

commit d7467484ff08a5f9a566d3a7b21bab426ff89127
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-06-06T18:31:37Z

    [SPARK-2050 - 2][SQL] DIV and BETWEEN should not be case sensitive.
    
    Followup: #989
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #994 from marmbrus/caseSensitiveFunctions2 and squashes the 
following commits:
    
    9d9c8ed [Michael Armbrust] Fix DIV and BETWEEN.
    
    (cherry picked from commit 8d210560be8b143e48abfbaca347f383b5aa4798)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 39cfa9c0be34d4baf9de4eb9f9191c7b406c4d59
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-06-07T21:20:33Z

    [SPARK-1994][SQL] Weird data corruption bug when running Spark SQL on data 
in HDFS
    
    Basically there is a race condition (possibly a scala bug?) when these 
values are recomputed on all of the slaves that results in an incorrect 
projection being generated (possibly because the GUID uniqueness contract is 
broken?).
    
    In general we should probably enforce that all expression planing occurs on 
the driver, as is now occurring here.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #1004 from marmbrus/fixAggBug and squashes the following commits:
    
    e0c116c [Michael Armbrust] Compute aggregate expression during planning 
instead of lazily on workers.
    
    (cherry picked from commit a6c72ab16e7a3027739ab419819f5222e270838e)
    Signed-off-by: Reynold Xin <r...@apache.org>

commit 3f8450ec67fe84c290d725d4ebfcf9f5a7b0b109
Author: maji2014 <ma...@asiainfo-linkage.com>
Date:   2014-06-08T22:14:27Z

    Update run-example
    
    Old code can only be ran under spark_home and use "bin/run-example".
     Error "./run-example: line 55: ./bin/spark-submit: No such file or 
directory" appears when running in other place. So change this
    
    Author: maji2014 <ma...@asiainfo-linkage.com>
    
    Closes #1011 from maji2014/master and squashes the following commits:
    
    2cc1af6 [maji2014] Update run-example
    
    Closes #988.
    (cherry picked from commit e9261d0866a610eab29fa332726186b534d1018f)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit 502a8f795551007db8a390c4eb7cfde7ca7742fb
Author: Neville Li <nevi...@spotify.com>
Date:   2014-06-09T06:18:27Z

    [SPARK-2067] use relative path for Spark logo in UI
    
    Author: Neville Li <nevi...@spotify.com>
    
    Closes #1006 from nevillelyh/gh/SPARK-2067 and squashes the following 
commits:
    
    9ee64cf [Neville Li] [SPARK-2067] use relative path for Spark logo in UI
    (cherry picked from commit 15ddbef414d5fd6d4672936ba3c747b5fb7ab52b)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit a5848d325ae0909072800cbb3ea9ad73a3708965
Author: Andrew Ash <and...@andrewash.com>
Date:   2014-06-09T17:21:21Z

    SPARK-1944 Document --verbose in spark-shell -h
    
    https://issues.apache.org/jira/browse/SPARK-1944
    
    Author: Andrew Ash <and...@andrewash.com>
    
    Closes #1020 from ash211/SPARK-1944 and squashes the following commits:
    
    a831c4d [Andrew Ash] SPARK-1944 Document --verbose in spark-shell -h
    
    (cherry picked from commit 35630c86ff0e27862c9d902887eb0a24d25867ae)
    Signed-off-by: Reynold Xin <r...@apache.org>

commit 73cd1f8223a4799fd104fe48ba011315236cf4a8
Author: Daoyuan <daoyuan.w...@intel.com>
Date:   2014-06-09T18:31:36Z

    [SPARK-1495][SQL]add support for left semi join
    
    Just submit another solution for #395
    
    Author: Daoyuan <daoyuan.w...@intel.com>
    Author: Michael Armbrust <mich...@databricks.com>
    Author: Daoyuan Wang <daoyuan.w...@intel.com>
    
    Closes #837 from adrian-wang/left-semi-join-support and squashes the 
following commits:
    
    d39cd12 [Daoyuan Wang] Merge pull request #1 from marmbrus/pr/837
    6713c09 [Michael Armbrust] Better debugging for failed query tests.
    035b73e [Michael Armbrust] Add test for left semi that can't be done with a 
hash join.
    5ec6fa4 [Michael Armbrust] Add left semi to SQL Parser.
    4c726e5 [Daoyuan] improvement according to Michael
    8d4a121 [Daoyuan] add golden files for leftsemijoin
    83a3c8a [Daoyuan] scala style fix
    14cff80 [Daoyuan] add support for left semi join
    
    (cherry picked from commit 0cf600280167a94faec75736223256e8f2e48085)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 65fa7bcac81fc2a7a6c578775f72929cb201c20a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-06-09T21:24:19Z

    [SQL] Simple framework for debugging query execution
    
    Only records number of tuples and unique dataTypes output right now...
    
    Example:
    ```scala
    scala> import org.apache.spark.sql.execution.debug._
    scala> hql("SELECT value FROM src WHERE key > 10").debug(sparkContext)
    
    Results returned: 489
    == Project [value#1:0] ==
    Tuples output: 489
     value StringType: {java.lang.String}
    == Filter (key#0:1 > 10) ==
    Tuples output: 489
     value StringType: {java.lang.String}
     key IntegerType: {java.lang.Integer}
    == HiveTableScan [value#1,key#0], (MetastoreRelation default, src, None), 
None ==
    Tuples output: 500
     value StringType: {java.lang.String}
     key IntegerType: {java.lang.Integer}
    ```
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #1005 from marmbrus/debug and squashes the following commits:
    
    dcc3ca6 [Michael Armbrust] Add comments.
    c9dded2 [Michael Armbrust] Simple framework for debugging query execution
    
    (cherry picked from commit c6e041d171e3d9882ab15e2bd7a7217dc19647f6)
    Signed-off-by: Reynold Xin <r...@apache.org>

commit 5a79ba13ea75838fe53d99ca5aa289d81a58cdb3
Author: Zongheng Yang <zonghen...@gmail.com>
Date:   2014-06-09T23:47:44Z

    [SPARK-1704][SQL] Fully support EXPLAIN commands as SchemaRDD.
    
    This PR attempts to resolve 
[SPARK-1704](https://issues.apache.org/jira/browse/SPARK-1704) by introducing a 
physical plan for EXPLAIN commands, which just prints out the debug string 
(containing various SparkSQL's plans) of the corresponding QueryExecution for 
the actual query.
    
    Author: Zongheng Yang <zonghen...@gmail.com>
    
    Closes #1003 from concretevitamin/explain-cmd and squashes the following 
commits:
    
    5b7911f [Zongheng Yang] Add a regression test.
    1bfa379 [Zongheng Yang] Modify output().
    719ada9 [Zongheng Yang] Override otherCopyArgs for ExplainCommandPhysical.
    4318fd7 [Zongheng Yang] Make all output one Row.
    439c6ab [Zongheng Yang] Minor cleanups.
    408f574 [Zongheng Yang] SPARK-1704: Add CommandStrategy and 
ExplainCommandPhysical.
    
    (cherry picked from commit a9ec033c8cf489898cc47e2043bd9e86b7df1ff8)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit d5da81cdd1c330b125282a39bbca040fbb6c7dda
Author: Zongheng Yang <zonghen...@gmail.com>
Date:   2014-06-10T07:49:09Z

    [SPARK-1508][SQL] Add SQLConf to SQLContext.
    
    This PR (1) introduces a new class SQLConf that stores key-value properties 
for a SQLContext (2) clean up the semantics of various forms of SET commands.
    
    The SQLConf class unlocks user-controllable optimization opportunities; for 
example, user can now override the number of partitions used during an 
Exchange. A SQLConf can be accessed and modified programmatically through its 
getters and setters. It can also be modified through SET commands executed by 
`sql()` or `hql()`. Note that users now have the ability to change a particular 
property for different queries inside the same Spark job, unlike settings 
configured in SparkConf.
    
    For SET commands: "SET" will return all properties currently set in a 
SQLConf, "SET key" will return the key-value pair (if set) or an undefined 
message, and "SET key=value" will call the setter on SQLConf, and if a 
HiveContext is used, it will be executed in Hive as well.
    
    Author: Zongheng Yang <zonghen...@gmail.com>
    
    Closes #956 from concretevitamin/sqlconf and squashes the following commits:
    
    4968c11 [Zongheng Yang] Very minor cleanup.
    d74dde5 [Zongheng Yang] Remove the redundant mkQueryExecution() method.
    c129b86 [Zongheng Yang] Merge remote-tracking branch 'upstream/master' into 
sqlconf
    26c40eb [Zongheng Yang] Make SQLConf a trait and have SQLContext mix it in.
    dd19666 [Zongheng Yang] Update a comment.
    baa5d29 [Zongheng Yang] Remove default param for shuffle partitions 
accessor.
    5f7e6d8 [Zongheng Yang] Add default num partitions.
    22d9ed7 [Zongheng Yang] Fix output() of Set physical. Add SQLConf param 
accessor method.
    e9856c4 [Zongheng Yang] Use java.util.Collections.synchronizedMap on a Java 
HashMap.
    88dd0c8 [Zongheng Yang] Remove redundant SET Keyword.
    271f0b1 [Zongheng Yang] Minor change.
    f8983d1 [Zongheng Yang] Minor changes per review comments.
    1ce8a5e [Zongheng Yang] Invoke runSqlHive() in SQLConf#get for the 
HiveContext case.
    b766af9 [Zongheng Yang] Remove a test.
    d52e1bd [Zongheng Yang] De-hardcode number of shuffle partitions for 
BasicOperators (read from SQLConf).
    555599c [Zongheng Yang] Bullet-proof (relatively) parsing SET per review 
comment.
    c2067e8 [Zongheng Yang] Mark SQLContext transient and put it in a second 
param list.
    2ea8cdc [Zongheng Yang] Wrap long line.
    41d7f09 [Zongheng Yang] Fix imports.
    13279e6 [Zongheng Yang] Refactor the logic of eagerly processing SET 
commands.
    b14b83e [Zongheng Yang] In a HiveContext, make SQLConf a subset of HiveConf.
    6983180 [Zongheng Yang] Move a SET test to SQLQuerySuite and make it 
complete.
    5b67985 [Zongheng Yang] New line at EOF.
    c651797 [Zongheng Yang] Add commands.scala.
    efd82db [Zongheng Yang] Clean up semantics of several cases of SET.
    c1017c2 [Zongheng Yang] WIP in changing SetCommand to take two Options (for 
different semantics of SETs).
    0f00d86 [Zongheng Yang] Add a test for singleton set command in SQL.
    41acd75 [Zongheng Yang] Add a test for hql() in HiveQuerySuite.
    2276929 [Zongheng Yang] Fix default hive result for set commands in 
HiveComparisonTest.
    3b0c71b [Zongheng Yang] Remove Parser for set commands. A few other fixes.
    d0c4578 [Zongheng Yang] Tmux typo.
    0ecea46 [Zongheng Yang] Changes for HiveQl and HiveContext.
    ce22d80 [Zongheng Yang] Fix parsing issues.
    cb722c1 [Zongheng Yang] Finish up SQLConf patch.
    4ebf362 [Zongheng Yang] First cut at SQLConf inside SQLContext.
    
    (cherry picked from commit 08ed9ad81397b71206c4dc903bfb94b6105691ed)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 89caa40e360573288cbc4275c02f6394d081c129
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-06-10T08:14:44Z

    Moved hiveOperators.scala to the right package folder
    
    The package is `org.apache.spark.sql.hive.execution`, while the file was 
placed under `sql/hive/src/main/scala/org/apache/spark/sql/hive/`.
    
    Author: Cheng Lian <lian.cs....@gmail.com>
    
    Closes #1029 from liancheng/moveHiveOperators and squashes the following 
commits:
    
    d632eb8 [Cheng Lian] Moved hiveOperators.scala to the right package folder
    
    (cherry picked from commit a9a461c594fd20e46947e318095df60bddb67559)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 35894af8cc66056352a1e4c7ebff1b6ecb12b7b9
Author: witgo <wi...@qq.com>
Date:   2014-06-10T15:34:57Z

    [SPARK-1978] In some cases, spark-yarn does not automatically restart the 
failed container
    
    Author: witgo <wi...@qq.com>
    
    Closes #921 from witgo/allocateExecutors and squashes the following commits:
    
    bc3aa66 [witgo] review commit
    8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark 
into allocateExecutors
    32ac7af [witgo] review commit
    056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark 
into allocateExecutors
    04c6f7e [witgo] Merge branch 'master' into allocateExecutors
    aff827c [witgo] review commit
    5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark 
into allocateExecutors
    1faf4f4 [witgo] Merge branch 'master' into allocateExecutors
    3c464bd [witgo] add time limit to allocateExecutors
    e00b656 [witgo] In some cases, yarn does not automatically restart the 
container

commit 1d9b7651e42a33dd27d8c7a470f12b7cb6c14385
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-06-10T20:13:17Z

    HOTFIX: Fix Python tests on Jenkins.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #1036 from pwendell/jenkins-test and squashes the following commits:
    
    9c99856 [Patrick Wendell] Better output during tests
    71e7b74 [Patrick Wendell] Removing incorrect python path
    74984db [Patrick Wendell] HOTFIX: Allow PySpark tests to run on Jenkins.
    (cherry picked from commit fb499be1ac935b6f91046ec8ff23ac1267c82342)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit 5bc186dc7dc6057ff10a4da84ab054ea32772bf3
Author: Ankur Dave <ankurd...@gmail.com>
Date:   2014-06-10T20:15:06Z

    HOTFIX: Increase time limit for Bagel test
    
    The test was timing out on some slow EC2 workers.
    
    Author: Ankur Dave <ankurd...@gmail.com>
    
    Closes #1037 from ankurdave/bagel-test-time-limit and squashes the 
following commits:
    
    67fd487 [Ankur Dave] Increase time limit for Bagel test
    (cherry picked from commit 55a0e87ee4655106d5e0ed799b11e77f68a17dbb)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit f397ffbdf3348359231999115d287e5de760736c
Author: Cheng Hao <hao.ch...@intel.com>
Date:   2014-06-10T19:59:52Z

    [SPARK-2076][SQL] Pushdown the join filter & predication for outer join
    
    As the rule described in 
https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior, we can 
optimize the SQL Join by pushing down the Join predicate and Where predicate.
    
    Author: Cheng Hao <hao.ch...@intel.com>
    
    Closes #1015 from chenghao-intel/join_predicate_push_down and squashes the 
following commits:
    
    10feff9 [Cheng Hao] fix bug of changing the join type in 
PredicatePushDownThroughJoin
    44c6700 [Cheng Hao] Add logical to support pushdown the join filter
    0bce426 [Cheng Hao] Pushdown the join filter & predicate for outer join
    
    (cherry picked from commit db0c038a66cb228bcb62a5607cd0ed013d0f9f20)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 86c4a79dc515df05641a0a25a184491c92ab0ab5
Author: egraldlo <egral...@gmail.com>
Date:   2014-06-10T21:07:55Z

    [SQL] Add average overflow test case from #978
    
    By @egraldlo.
    
    Author: egraldlo <egral...@gmail.com>
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #1033 from marmbrus/pr/978 and squashes the following commits:
    
    e228c5e [Michael Armbrust] Remove "test".
    762aeaf [Michael Armbrust] Remove unneeded rule. More descriptive name for 
test table.
    d414cd7 [egraldlo] fommatting issues
    1153f75 [egraldlo] do best to avoid overflowing in function avg().
    
    (cherry picked from commit 1abbde0e89131ad95e793ac1834c392db46b448e)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit ac8c27bdffc22d01afc049a64648237fdc607e66
Author: joyyoj <suns...@gmail.com>
Date:   2014-06-11T00:26:17Z

    [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not re...
    
    flume event sent to Spark will fail if the body is too large and numHeaders 
is greater than zero
    
    Author: joyyoj <suns...@gmail.com>
    
    Closes #951 from joyyoj/master and squashes the following commits:
    
    f4660c5 [joyyoj] [SPARK-1998] SparkFlumeEvent with body bigger than 1020 
bytes are not read properly
    (cherry picked from commit 29660443077619ee854025b8d0d3d64181724054)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit 2cdce7cf35ffe48810920978b6f55be8a456e844
Author: Zongheng Yang <zonghen...@gmail.com>
Date:   2014-06-11T04:59:01Z

    HOTFIX: clear() configs in SQLConf-related unit tests.
    
    Thanks goes to @liancheng, who pointed out that `sql/test-only 
*.SQLConfSuite *.SQLQuerySuite` passed but `sql/test-only *.SQLQuerySuite 
*.SQLConfSuite` failed. The reason is that some tests use the same test keys 
and without clear()'ing, they get carried over to other tests. This hotfix 
simply adds some `clear()` calls.
    
    This problem was not evident on Jenkins before probably because 
`parallelExecution` is not set to `false` for `sqlCoreSettings`.
    
    Author: Zongheng Yang <zonghen...@gmail.com>
    
    Closes #1040 from concretevitamin/sqlconf-tests and squashes the following 
commits:
    
    6d14ceb [Zongheng Yang] HOTFIX: clear() confs in SQLConf related unit tests.
    
    (cherry picked from commit 601032f5bfe2dcdc240bfcc553f401e6facbf5ec)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 6d15e9f7cbe9dffe8695519fc5cb6baa59f75776
Author: Takuya UESHIN <ues...@happy-camper.st>
Date:   2014-06-11T06:13:48Z

    [SPARK-2093] [SQL] NullPropagation should use exact type value.
    
    `NullPropagation` should use exact type value when transform `Count` or 
`Sum`.
    
    Author: Takuya UESHIN <ues...@happy-camper.st>
    
    Closes #1034 from ueshin/issues/SPARK-2093 and squashes the following 
commits:
    
    65b6ff1 [Takuya UESHIN] Modify the literal value of the result of 
transformation from Sum to long value.
    830c20b [Takuya UESHIN] Add Cast to the result of transformation from Count.
    9314806 [Takuya UESHIN] Fix NullPropagation to use exact type value.
    
    (cherry picked from commit 0402bd77ec786d1fa6cfd7f9cc3aa97c7ab16fd8)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 65ed7793db7a3d97aa244c372ac9a756acfa9447
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-06-11T07:06:50Z

    [SPARK-1968][SQL] SQL/HiveQL command for caching/uncaching tables
    
    JIRA issue: [SPARK-1968](https://issues.apache.org/jira/browse/SPARK-1968)
    
    This PR added support for SQL/HiveQL command for caching/uncaching tables:
    
    ```
    scala> sql("CACHE TABLE src")
    ...
    res0: org.apache.spark.sql.SchemaRDD =
    SchemaRDD[0] at RDD at SchemaRDD.scala:98
    == Query Plan ==
    CacheCommandPhysical src, true
    
    scala> table("src")
    ...
    res1: org.apache.spark.sql.SchemaRDD =
    SchemaRDD[3] at RDD at SchemaRDD.scala:98
    == Query Plan ==
    InMemoryColumnarTableScan [key#0,value#1], (HiveTableScan [key#0,value#1], 
(MetastoreRelation default, src, None), None), false
    
    scala> isCached("src")
    res2: Boolean = true
    
    scala> sql("CACHE TABLE src")
    ...
    res3: org.apache.spark.sql.SchemaRDD =
    SchemaRDD[4] at RDD at SchemaRDD.scala:98
    == Query Plan ==
    CacheCommandPhysical src, false
    
    scala> table("src")
    ...
    res4: org.apache.spark.sql.SchemaRDD =
    SchemaRDD[11] at RDD at SchemaRDD.scala:98
    == Query Plan ==
    HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None
    
    scala> isCached("src")
    res5: Boolean = false
    ```
    
    Things also work for `hql`.
    
    Author: Cheng Lian <lian.cs....@gmail.com>
    
    Closes #1038 from liancheng/sqlCacheTable and squashes the following 
commits:
    
    ecb7194 [Cheng Lian] Trimmed the SQL string before parsing special commands
    6f4ce42 [Cheng Lian] Moved logical command classes to a separate file
    3458a24 [Cheng Lian] Added comment for public API
    f0ffacc [Cheng Lian] Added isCached() predicate
    15ec6d2 [Cheng Lian] Added "(UN)CACHE TABLE" SQL/HiveQL statements
    
    (cherry picked from commit 0266a0c8a70e0fbaeb0df63031f7a750ffc31a80)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 54ff00547c89c135c540d754daf9e19e24d92f67
Author: Qiuzhuang.Lian <qiuzhuang.l...@gmail.com>
Date:   2014-06-11T07:36:06Z

    SPARK-2107: FilterPushdownSuite doesn't need Junit jar.
    
    Author: Qiuzhuang.Lian <qiuzhuang.l...@gmail.com>
    
    Closes #1046 from Qiuzhuang/master and squashes the following commits:
    
    0a9921a [Qiuzhuang.Lian] SPARK-2107: FilterPushdownSuite doesn't need Junit 
jar.

commit 9ef076510e832a4cd56d692937b597e573187416
Author: Prashant Sharma <prashan...@imaginea.com>
Date:   2014-06-11T17:49:34Z

    [SPARK-2108] Mark SparkContext methods that return block information as 
developer API's
    
    Author: Prashant Sharma <prashan...@imaginea.com>
    
    Closes #1047 from ScrapCodes/SPARK-2108/mark-as-dev-api and squashes the 
following commits:
    
    073ee34 [Prashant Sharma] [SPARK-2108] Mark SparkContext methods that 
return block information as developer API's
    (cherry picked from commit e508f599f88baaa31a3498fb0bdbafdbc303119e)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit 684a93a7263a79c612275ba36e06f6438162ff28
Author: Lars Albertsson <la...@spotify.com>
Date:   2014-06-11T17:54:42Z

    SPARK-2113: awaitTermination() after stop() will hang in Spark Stremaing
    
    Author: Lars Albertsson <la...@spotify.com>
    
    Closes #1001 from lallea/contextwaiter_stopped and squashes the following 
commits:
    
    93cd314 [Lars Albertsson] Mend StreamingContext stop() followed by 
awaitTermination().
    (cherry picked from commit 4d5c12aa1c54c49377a4bafe3bcc4993d5e1a552)
    
    Signed-off-by: Patrick Wendell <pwend...@gmail.com>

commit cc004488d49e5fc431cb7bd3907faacca43d4a9e
Author: Sameer Agarwal <sam...@databricks.com>
Date:   2014-06-11T19:01:04Z

    [SPARK-2042] Prevent unnecessary shuffle triggered by take()
    
    This PR implements `take()` on a `SchemaRDD` by inserting a logical limit 
that is followed by a `collect()`. This is also accompanied by adding a 
catalyst optimizer rule for collapsing adjacent limits. Doing so prevents an 
unnecessary shuffle that is sometimes triggered by `take()`.
    
    Author: Sameer Agarwal <sam...@databricks.com>
    
    Closes #1048 from sameeragarwal/master and squashes the following commits:
    
    3eeb848 [Sameer Agarwal] Fixing Tests
    1b76ff1 [Sameer Agarwal] Deprecating limit(limitExpr: Expression) in v1.1.0
    b723ac4 [Sameer Agarwal] Added limit folding tests
    a0ff7c4 [Sameer Agarwal] Adding catalyst rule to fold two consecutive limits
    8d42d03 [Sameer Agarwal] Implement trigger() as limit() followed by 
collect()
    
    (cherry picked from commit 4107cce58c41160a0dc20339621eacdf8a8b1191)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 597c7efdcfc6879478b606d27fdb1a8d42372e1a
Author: Daoyuan <daoyuan.w...@intel.com>
Date:   2014-06-11T19:08:28Z

    [SQL] Code Cleanup: Left Semi Hash Join
    
    Some improvement for PR #837, add another case to white list and use 
`filter` to build result iterator.
    
    Author: Daoyuan <daoyuan.w...@intel.com>
    
    Closes #1049 from adrian-wang/clean-LeftSemiJoinHash and squashes the 
following commits:
    
    b314d5a [Daoyuan] change hashSet name
    27579a9 [Daoyuan] add semijoin to white list and use filter to create new 
iterator in LeftSemiJoinBNL
    
    Signed-off-by: Michael Armbrust <mich...@databricks.com>
    (cherry picked from commit ce6deb1e5b4cd40c97730fcf5dc89cb2f624bce2)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 81049eb5432b1738256df12df41cad3c9994ae03
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-06-11T22:54:41Z

    HOTFIX: PySpark tests should be order insensitive.
    
    This has been messing up the SQL PySpark tests on Jenkins.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #1054 from pwendell/pyspark and squashes the following commits:
    
    1eb5487 [Patrick Wendell] False change
    06f062d [Patrick Wendell] HOTFIX: PySpark tests should be order insensitive

commit e3955643d6f838146e8b2e0463b27612d8e48d02
Author: Takuya UESHIN <ues...@happy-camper.st>
Date:   2014-06-12T00:58:35Z

    [SPARK-2052] [SQL] Add optimization for CaseConversionExpression's.
    
    Add optimization for `CaseConversionExpression`'s.
    
    Author: Takuya UESHIN <ues...@happy-camper.st>
    
    Closes #990 from ueshin/issues/SPARK-2052 and squashes the following 
commits:
    
    2568666 [Takuya UESHIN] Move some rules back.
    dde7ede [Takuya UESHIN] Add tests to check if ConstantFolding can handle 
null literals and remove the unneeded rules from NullPropagation.
    c4eea67 [Takuya UESHIN] Fix toString methods.
    23e2363 [Takuya UESHIN] Make CaseConversionExpressions foldable if the 
child is foldable.
    0ff7568 [Takuya UESHIN] Add tests for collapsing case statements.
    3977d80 [Takuya UESHIN] Add optimization for CaseConversionExpression's.
    
    (cherry picked from commit 9a2448daf984d5bb550dfe0d9e28cbb80ef5cb51)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 358e7e51cc736223d36071b44b7ff853635fc6e7
Author: Doris Xin <doris.s....@gmail.com>
Date:   2014-06-12T19:53:07Z

    [SPARK-2088] fix NPE in toString
    
    After deserialization, the transient field creationSiteInfo does not get 
backfilled with the default value, but the toString method, which is invoked by 
the serializer, expects the field to always be non-null. An NPE is thrown when 
toString is called by the serializer when creationSiteInfo is null.
    
    Author: Doris Xin <doris.s....@gmail.com>
    
    Closes #1028 from dorx/toStringNPE and squashes the following commits:
    
    f20021e [Doris Xin] unit test for toString after desrialization
    6f0a586 [Doris Xin] Merge branch 'master' into toStringNPE
    f47fecf [Doris Xin] Merge branch 'master' into toStringNPE
    76199c6 [Doris Xin] [SPARK-2088] fix NPE in toString
    
    (cherry picked from commit 83c226d454722d5dea186d48070fb98652d0dafb)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 3962abaf93217eced5856d28ad6dc02f8b653e98
Author: Thomas Graves <tgra...@apache.org>
Date:   2014-06-12T21:28:00Z

    [SPARK-2080] Yarn: report HS URL in client mode, correct user in cluster 
mode.
    
    Yarn client mode was not setting the app's tracking URL to the
    History Server's URL when configured by the user. Now client mode
    behaves the same as cluster mode.
    
    In SparkContext.scala, the "user.name" system property had precedence
    over the SPARK_USER environment variable. This means that SPARK_USER
    was never used, since "user.name" is always set by the JVM. In Yarn
    cluster mode, this means the application always reported itself as
    being run by user "yarn" (or whatever user was running the Yarn NM).
    One could argue that the correct fix would be to use UGI.getCurrentUser()
    here, but at least for Yarn that will match what SPARK_USER is set
    to.
    
    Author: Marcelo Vanzin <van...@cloudera.com>
    
    This patch had conflicts when merged, resolved by
    Committer: Thomas Graves <tgra...@apache.org>
    
    Closes #1002 from vanzin/yarn-client-url and squashes the following commits:
    
    4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also.
    4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user 
in cluster mode.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.0

Reply via email to