GitHub user shankervalipireddy opened a pull request:
https://github.com/apache/spark/pull/5021
[SPARK-1301][WebUI]Add UI elements to collapse "Aggregated Metrics by
Executor" pane on stage page
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5021.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5021
----
commit efffc2e428b1e867a586749685da90875f6bcfc4
Author: Daoyuan Wang <[email protected]>
Date: 2015-02-13T21:46:50Z
[SPARK-5642] [SQL] Apply column pruning on unused aggregation fields
select k from (select key k, max(value) v from src group by k) t
Author: Daoyuan Wang <[email protected]>
Author: Michael Armbrust <[email protected]>
Closes #4415 from adrian-wang/groupprune and squashes the following commits:
5d2d8a3 [Daoyuan Wang] address Michael's comments
61f8ef7 [Daoyuan Wang] add a unit test
80ddcc6 [Daoyuan Wang] keep project
b69d385 [Daoyuan Wang] add a prune rule for grouping set
(cherry picked from commit 2cbb3e433ae334d5c318f05b987af314c854fbcc)
Signed-off-by: Michael Armbrust <[email protected]>
commit d9d0250fc5dfe529bebd4f67f945f4d7c3fc4106
Author: Yin Huai <[email protected]>
Date: 2015-02-13T21:51:06Z
[SPARK-5789][SQL]Throw a better error message if JsonRDD.parseJson
encounters unrecoverable parsing errors.
Author: Yin Huai <[email protected]>
Closes #4582 from yhuai/jsonErrorMessage and squashes the following commits:
152dbd4 [Yin Huai] Update error message.
1466256 [Yin Huai] Throw a better error message when a JSON object in the
input dataset span multiple records (lines for files or strings for an RDD of
strings).
(cherry picked from commit 2e0c084528409e1c565e6945521a33c0835ebbee)
Signed-off-by: Michael Armbrust <[email protected]>
commit 965876328d037f2a817f8c6bf5df0b3071abb43a
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-13T23:09:27Z
[SPARK-5806] re-organize sections in mllib-clustering.md
Put example code close to the algorithm description.
Author: Xiangrui Meng <[email protected]>
Closes #4598 from mengxr/SPARK-5806 and squashes the following commits:
a137872 [Xiangrui Meng] re-organize sections in mllib-clustering.md
(cherry picked from commit cc56c8729a76af85aa6eb5d2f99787cca5e5b38f)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 356b798b3878bac1f89304e0be0f698f9eed6ec0
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-14T00:43:49Z
[SPARK-5803][MLLIB] use ArrayBuilder to build primitive arrays
because ArrayBuffer is not specialized.
Author: Xiangrui Meng <[email protected]>
Closes #4594 from mengxr/SPARK-5803 and squashes the following commits:
1261bd5 [Xiangrui Meng] merge master
a4ea872 [Xiangrui Meng] use ArrayBuilder to build primitive arrays
(cherry picked from commit d50a91d529b0913364b483c511397d4af308a435)
Signed-off-by: Xiangrui Meng <[email protected]>
commit fccd38d2e08fb3502440a942a6958af5aada539b
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-14T00:45:59Z
[SPARK-5730][ML] add doc groups to spark.ml components
This PR adds three groups to the ScalaDoc: `param`, `setParam`, and
`getParam`. Params will show up in the generated Scala API doc as the top
group. Setters/getters will be at the bottom.
Preview:

Author: Xiangrui Meng <[email protected]>
Closes #4600 from mengxr/SPARK-5730 and squashes the following commits:
febed9a [Xiangrui Meng] add doc groups to spark.ml components
(cherry picked from commit 4f4c6d5a5db04a56906bacdc85d7e5589b6edada)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 152147f5f884ae4eea3873f01719e6ab9bc7afd2
Author: Josh Rosen <[email protected]>
Date: 2015-02-14T01:45:31Z
[SPARK-5227] [SPARK-5679] Disable FileSystem cache in
WholeTextFileRecordReaderSuite
This patch fixes two difficult-to-reproduce Jenkins test failures in
InputOutputMetricsSuite (SPARK-5227 and SPARK-5679). The problem was that
WholeTextFileRecordReaderSuite modifies the `fs.local.block.size` Hadoop
configuration and this change was affecting subsequent test suites due to
Hadoop's caching of FileSystem instances (see HADOOP-8490 for more details).
The fix implemented here is to disable FileSystem caching in
WholeTextFileRecordReaderSuite.
Author: Josh Rosen <[email protected]>
Closes #4599 from JoshRosen/inputoutputsuite-fix and squashes the following
commits:
47dc447 [Josh Rosen] [SPARK-5227] [SPARK-5679] Disable FileSystem cache in
WholeTextFileRecordReaderSuite
(cherry picked from commit d06d5ee9b33505774ef1e5becc01b47492f1a2dc)
Signed-off-by: Patrick Wendell <[email protected]>
commit db5747921a648c3f7cf1de6dba70b82584afd097
Author: Sean Owen <[email protected]>
Date: 2015-02-14T04:12:52Z
SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus
This just unpersist()s each RDD in this code that was cache()ed.
Author: Sean Owen <[email protected]>
Closes #4234 from srowen/SPARK-3290 and squashes the following commits:
66c1e11 [Sean Owen] unpersist() each RDD that was cache()ed
(cherry picked from commit 0ce4e430a81532dc317136f968f28742e087d840)
Signed-off-by: Ankur Dave <[email protected]>
commit ba91bf5f4f048a721d97eb5779957ec39b15319f
Author: Reynold Xin <[email protected]>
Date: 2015-02-14T07:03:22Z
[SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
- The old implicit would convert RDDs directly to DataFrames, and that
added too many methods.
- toDataFrame -> toDF
- Dsl -> functions
- implicits moved into SQLContext.implicits
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
Python changes:
- toDataFrame -> toDF
- Dsl -> functions package
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
- add toDF functions to RDD on SQLContext init
- add flatMap to DataFrame
Author: Reynold Xin <[email protected]>
Author: Davies Liu <[email protected]>
Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
5ef9910 [Reynold Xin] More fix
61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into
SPARK-5752
ff5832c [Reynold Xin] Fix python
749c675 [Reynold Xin] count(*) fixes.
5806df0 [Reynold Xin] Fix build break again.
d941f3d [Reynold Xin] Fixed explode compilation break.
fe1267a [Davies Liu] flatMap
c4afb8e [Reynold Xin] style
d9de47f [Davies Liu] add comment
b783994 [Davies Liu] add comment for toDF
e2154e5 [Davies Liu] schema() -> schema
3a1004f [Davies Liu] Dsl -> functions, toDF()
fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits
moved into SQLContext.implicits - addColumn -> withColumn - renameColumn ->
withColumnRenamed
0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs
directly to DataFrames
97dd47c [Davies Liu] fix mistake
6168f74 [Davies Liu] fix test
1fc0199 [Davies Liu] fix test
a075cd5 [Davies Liu] clean up, toPandas
663d314 [Davies Liu] add test for agg('*')
9e214d5 [Reynold Xin] count(*) fixes.
1ed7136 [Reynold Xin] Fix build break again.
921b2e3 [Reynold Xin] Fixed explode compilation break.
14698d4 [Davies Liu] flatMap
ba3e12d [Reynold Xin] style
d08c92d [Davies Liu] add comment
5c8b524 [Davies Liu] add comment for toDF
a4e5e66 [Davies Liu] schema() -> schema
d377fc9 [Davies Liu] Dsl -> functions, toDF()
6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits
moved into SQLContext.implicits - addColumn -> withColumn - renameColumn ->
withColumnRenamed
807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs
directly to DataFrames
(cherry picked from commit e98dfe627c5d0201464cdd0f363f391ea84c389a)
Signed-off-by: Reynold Xin <[email protected]>
commit e99e170c7bff95a102b3bf00cc31bfa81951d0cf
Author: gasparms <[email protected]>
Date: 2015-02-14T20:10:29Z
[SPARK-5800] Streaming Docs. Change linked files according the selected
language
Currently, Spark Streaming Programming Guide after updateStateByKey
explanation links to file stateful_network_wordcount.py and note "For the
complete Scala code ..." for any language tab selected. This is an incoherence.
I've changed the guide and link its pertinent example file.
JavaStatefulNetworkWordCount.java example was not created so I added to the
commit.
Author: gasparms <[email protected]>
Closes #4589 from gasparms/feature/streaming-guide and squashes the
following commits:
7f37f89 [gasparms] More style changes
ec202b0 [gasparms] Follow spark style guide
f527328 [gasparms] Improve example to look like scala example
4d8785c [gasparms] Remove throw exception
e92e6b8 [gasparms] Fix incoherence
92db405 [gasparms] Fix Streaming Programming Guide. Change files according
the selected language
commit 1945fcfd9ecbe84e9af7f35ee1d6ba06ac06d8e3
Author: Sean Owen <[email protected]>
Date: 2015-02-14T20:12:29Z
Revise formatting of previous commit
f80e2629bb74bc62960c61ff313f7e7802d61319
commit f87f3b755817aa239ae2efa718f7c1f4569d84bd
Author: gli <[email protected]>
Date: 2015-02-14T20:43:27Z
SPARK-5822 [BUILD] cannot import src/main/scala & src/test/scala into
eclipse as source folder
When import the whole project into eclipse as maven project, found that
the
src/main/scala & src/test/scala can not be set as source folder as
default
behavior, so add a "add-source" goal in scala-maven-plugin to let this
work.
Author: gli <[email protected]>
Closes #4531 from ligangty/addsource and squashes the following commits:
4e4db4c [gli] [IDE] cannot import src/main/scala & src/test/scala into
eclipse as source folder
(cherry picked from commit ed5f4bb7cb2c934b818d1e8b8b4e6a0056119c80)
Signed-off-by: Sean Owen <[email protected]>
commit 9c1c70d8cc8cf3afedecbc8868b3765c15bd493e
Author: Takeshi Yamamuro <[email protected]>
Date: 2015-02-15T14:42:20Z
[SPARK-5827][SQL] Add missing import in the example of SqlContext
If one tries an example by using copy&paste, throw an exception.
Author: Takeshi Yamamuro <[email protected]>
Closes #4615 from maropu/AddMissingImportInSqlContext and squashes the
following commits:
ab21b66 [Takeshi Yamamuro] Add missing import in the example of SqlContext
(cherry picked from commit c771e475c449fe07cf45f37bdca2ba6ce9600bfc)
Signed-off-by: Sean Owen <[email protected]>
commit 70ebad4d972101dc2f920ac014cd2359b99a50f9
Author: Reynold Xin <[email protected]>
Date: 2015-02-13T20:43:53Z
[HOTFIX] Ignore DirectKafkaStreamSuite.
commit d96e188c7a2b52cff32814f8e0596f030c14ad21
Author: martinzapletal <[email protected]>
Date: 2015-02-15T17:10:03Z
[MLLIB][SPARK-5502] User guide for isotonic regression
User guide for isotonic regression added to docs/mllib-regression.md
including code examples for Scala and Java.
Author: martinzapletal <[email protected]>
Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following
commits:
67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use
more general language rather than the code/implementation specific terms
80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic
regression, added links to the page, updated data and examples
7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic
regression including examples for Scala and Java
504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic
regression including examples for Scala and Java
(cherry picked from commit 61eb12674b90143388a01c22bf51cb7d02ab0447)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 4e099d757fc1bc4266f7849db6da0e996bf917be
Author: Sean Owen <[email protected]>
Date: 2015-02-15T17:15:48Z
SPARK-5669 [BUILD] Spark assembly includes incompatibly licensed
libgfortran, libgcc code via JBLAS
Exclude libgfortran, libgcc bundled by JBLAS for Windows. This much is
simple, and solves the essential license issue. But the more important question
is whether MLlib works on Windows then.
Author: Sean Owen <[email protected]>
Closes #4453 from srowen/SPARK-5669 and squashes the following commits:
734dd86 [Sean Owen] Exclude libgfortran, libgcc bundled by JBLAS, affecting
Windows / OS X / Linux 32-bit (not Linux 64-bit)
(cherry picked from commit 836577b382695558f5c97d94ee725d0156ebfad2)
Signed-off-by: Xiangrui Meng <[email protected]>
commit d71099133b64a4b9e9ab430cf1b314ee7deaf08d
Author: Xiangrui Meng <[email protected]>
Date: 2015-02-16T04:29:26Z
[SPARK-5769] Set params in constructors and in setParams in Python ML
pipelines
This PR allow Python users to set params in constructors and in setParams,
where we use decorator `keyword_only` to force keyword arguments. The trade-off
is discussed in the design doc of SPARK-4586.
Generated doc:

CC: davies rxin
Author: Xiangrui Meng <[email protected]>
Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits:
fedf720 [Xiangrui Meng] use toDF
d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into
py-pipeline-kw
cbc15d3 [Xiangrui Meng] fix style
5032097 [Xiangrui Meng] update pipeline signature
950774e [Xiangrui Meng] simplify keyword_only and update
constructor/setParams signatures
fdde5fc [Xiangrui Meng] fix style
c9384b8 [Xiangrui Meng] fix sphinx doc
8e59180 [Xiangrui Meng] add setParams and make constructors take params,
where we force keyword args
(cherry picked from commit cd4a15366244657c4b7936abe5054754534366f2)
Signed-off-by: Xiangrui Meng <[email protected]>
commit db3c539f20e17e327b2f284bf6fbb3f1abd7fe64
Author: Sean Owen <[email protected]>
Date: 2015-02-16T04:41:27Z
SPARK-5815 [MLLIB] Deprecate SVDPlusPlus APIs that expose DoubleMatrix from
JBLAS
Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with
return type that doesn't include DoubleMatrix
CC mengxr
Author: Sean Owen <[email protected]>
Closes #4614 from srowen/SPARK-5815 and squashes the following commits:
288cb05 [Sean Owen] Clarify deprecation plans in scaladoc
497458e [Sean Owen] Deprecate SVDPlusPlus.run and introduce
SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
(cherry picked from commit acf2558dc92901c342262c35eebb95f2a9b7a9ae)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 9cf7d7088d245b9b41ec78295cd2d6e3e395793d
Author: Peter Rudenko <[email protected]>
Date: 2015-02-16T04:51:32Z
[Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
If it's a last estimator in Pipeline there's no need to transform data,
since there's no next stage that would consume this data.
Author: Peter Rudenko <[email protected]>
Closes #4590 from petro-rudenko/patch-1 and squashes the following commits:
d13ec33 [Peter Rudenko] [Ml] SPARK-5796 Don't transform data on a last
estimator in Pipeline
(cherry picked from commit c78a12c4cc4d4312c4ee1069d3b218882d32d678)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 0d932058ed95c2b65dc308fd523cfea6d9b29b16
Author: Peter Rudenko <[email protected]>
Date: 2015-02-16T08:07:23Z
[Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
On a big dataset explicitly unpersist train and validation folds allows to
load more data into memory in the next loop iteration. On my environment
(single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation),
saved more than 5 minutes.
Author: Peter Rudenko <[email protected]>
Closes #4595 from petro-rudenko/patch-2 and squashes the following commits:
66a7cfb [Peter Rudenko] Move validationDataset cache to declaration
c5f3265 [Peter Rudenko] [Ml] SPARK-5804 Explicitly manage cache in
Crossvalidator k-fold loop
(cherry picked from commit d51d6ba1547ae75ac76c9e6d8ea99e937eb7d09f)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 066301c65075bce515770d8e70294b3b2f588b96
Author: Cheng Lian <[email protected]>
Date: 2015-02-16T09:33:37Z
[Minor] [SQL] Renames stringRddToDataFrame to stringRddToDataFrameHolder
for consistency
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4613)
<!-- Reviewable:end -->
Author: Cheng Lian <[email protected]>
Closes #4613 from liancheng/df-implicit-rename and squashes the following
commits:
db8bdd3 [Cheng Lian] Renames stringRddToDataFrame to
stringRddToDataFrameHolder for consistency
(cherry picked from commit 199a9e80275ac70582ea32f0f2f5a0a15b168785)
Signed-off-by: Cheng Lian <[email protected]>
commit 78f7edb85be5a397c0d1a2f3fd26aa83675cc0b1
Author: Cheng Lian <[email protected]>
Date: 2015-02-16T09:38:31Z
[SPARK-4553] [SPARK-5767] [SQL] Wires Parquet data source with the newly
introduced write support for data source API
This PR migrates the Parquet data source to the new data source write
support API. Now users can also overwriting and appending to existing tables.
Notice that inserting into partitioned tables is not supported yet.
When Parquet data source is enabled, insertion to Hive Metastore Parquet
tables is also fullfilled by the Parquet data source. This is done by the newly
introduced `HiveMetastoreCatalog.ParquetConversions` rule, which is a "proper"
implementation of the original hacky `HiveStrategies.ParquetConversion`. The
latter is still preserved, and can be removed together with the old Parquet
support in the future.
TODO:
- [x] Update outdated comments in `newParquet.scala`.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4563)
<!-- Reviewable:end -->
Author: Cheng Lian <[email protected]>
Closes #4563 from liancheng/parquet-refining and squashes the following
commits:
fa98d27 [Cheng Lian] Fixes test cases which should disable off Parquet data
source
2476e82 [Cheng Lian] Fixes compilation error introduced during rebasing
a83d290 [Cheng Lian] Passes Hive Metastore partitioning information to
ParquetRelation2
(cherry picked from commit 3ce58cf9c0ffe8b867ca79b404fe3fa291cf0e56)
Signed-off-by: Cheng Lian <[email protected]>
commit 0165e9d1324e24571c702b32d8d76edca8808887
Author: Liang-Chi Hsieh <[email protected]>
Date: 2015-02-16T18:06:11Z
[SPARK-5799][SQL] Compute aggregation function on specified numeric columns
Compute aggregation function on specified numeric columns. For example:
val df = Seq(("a", 1, 0, "b"), ("b", 2, 4, "c"), ("a", 2, 3,
"d")).toDataFrame("key", "value1", "value2", "rest")
df.groupBy("key").min("value2")
Author: Liang-Chi Hsieh <[email protected]>
Closes #4592 from viirya/specific_cols_agg and squashes the following
commits:
9446896 [Liang-Chi Hsieh] For comments.
314c4cd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master'
into specific_cols_agg
353fad7 [Liang-Chi Hsieh] For python unit tests.
54ed0c4 [Liang-Chi Hsieh] Address comments.
b079e6b [Liang-Chi Hsieh] Remove duplicate codes.
55100fb [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master'
into specific_cols_agg
880c2ac [Liang-Chi Hsieh] Fix Python style checks.
4c63a01 [Liang-Chi Hsieh] Fix pyspark.
b1a24fc [Liang-Chi Hsieh] Address comments.
2592f29 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master'
into specific_cols_agg
27069c3 [Liang-Chi Hsieh] Combine functions and add varargs annotation.
371a3f7 [Liang-Chi Hsieh] Compute aggregation function on specified numeric
columns.
(cherry picked from commit 5c78be7a515fc2fc92cda0517318e7b5d85762f4)
Signed-off-by: Reynold Xin <[email protected]>
commit fef2267cd4299de412a50b18cfd5e97ea7e7d851
Author: Sean Owen <[email protected]>
Date: 2015-02-16T19:32:31Z
SPARK-5795 [STREAMING] api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may
not friendly to java
Revise JavaPairDStream API declaration on saveAs Hadoop methods, to allow
it to be called directly as intended.
CC tdas for review
Author: Sean Owen <[email protected]>
Closes #4608 from srowen/SPARK-5795 and squashes the following commits:
36f1ead [Sean Owen] Add code that shows compile problem and fix
036bd27 [Sean Owen] Revise JavaPairDStream API declaration on saveAs Hadoop
methods, to allow it to be called directly as intended.
(cherry picked from commit 8e25373ce72061d3b6a353259ec627606afa4a5f)
Signed-off-by: Sean Owen <[email protected]>
commit 1a8895560f668faed33e99bcb88cafefd64fef03
Author: Cheng Hao <[email protected]>
Date: 2015-02-16T20:21:08Z
[SQL] [Minor] Update the SpecificMutableRow.copy
When profiling the Join / Aggregate queries via VisualVM, I noticed lots of
`SpecificMutableRow` objects created, as well as the `MutableValue`, since the
`SpecificMutableRow` are mostly used in data source implementation, but the
`copy` method could be called multiple times in upper modules (e.g. in Join /
aggregation etc.), duplicated instances created should be avoid.
Author: Cheng Hao <[email protected]>
Closes #4619 from chenghao-intel/specific_mutable_row and squashes the
following commits:
9300d23 [Cheng Hao] update the SpecificMutableRow.copy
(cherry picked from commit cc552e042896350e21eec9b78593de25006ecc70)
Signed-off-by: Michael Armbrust <[email protected]>
commit c2eaaea9f9f77662a4c9405b2796aa6bd362466e
Author: Daoyuan Wang <[email protected]>
Date: 2015-02-16T20:31:36Z
[SPARK-5824] [SQL] add null format in ctas and set default col comment to
null
Author: Daoyuan Wang <[email protected]>
Closes #4609 from adrian-wang/ctas and squashes the following commits:
0a75d5a [Daoyuan Wang] reorder import
93d1863 [Daoyuan Wang] add null format in ctas and set default col comment
to null
(cherry picked from commit 275a0c08134dea1896eab73a8e017256900fb1db)
Signed-off-by: Michael Armbrust <[email protected]>
commit 63fa123f1c2113caea74a7cf9a7293f256441dc7
Author: Michael Armbrust <[email protected]>
Date: 2015-02-16T20:32:56Z
[SQL] Initial support for reporting location of error in sql string
Author: Michael Armbrust <[email protected]>
Closes #4587 from marmbrus/position and squashes the following commits:
0810052 [Michael Armbrust] fix tests
395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position'
into position
e155dce [Michael Armbrust] more errors
f3efa51 [Michael Armbrust] Update AnalysisException.scala
d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of
error in sql string
(cherry picked from commit 104b2c45805ce0a9c86e2823f402de6e9f0aee81)
Signed-off-by: Michael Armbrust <[email protected]>
commit 0368494c502c33c05f806d106ff2042acad91cee
Author: OopsOutOfMemory <[email protected]>
Date: 2015-02-16T20:34:09Z
[SQL] Add fetched row count in SparkSQLCLIDriver
before this change:
```scala
Time taken: 0.619 seconds
```
after this change :
```scala
Time taken: 0.619 seconds, Fetched: 4 row(s)
```
Author: OopsOutOfMemory <[email protected]>
Closes #4604 from OopsOutOfMemory/rowcount and squashes the following
commits:
7252dea [OopsOutOfMemory] add fetched row count
(cherry picked from commit b4d7c7032d755de42951f92d9535287ef6230b9b)
Signed-off-by: Michael Armbrust <[email protected]>
commit 363a9a7d5ad682f828288f792a836c2c0b5e2f89
Author: Cheng Lian <[email protected]>
Date: 2015-02-16T20:48:55Z
[SPARK-5296] [SQL] Add more filter types for data sources API
This PR adds the following filter types for data sources API:
- `IsNull`
- `IsNotNull`
- `Not`
- `And`
- `Or`
The code which converts Catalyst predicate expressions to data sources
filters is very similar to filter conversion logics in `ParquetFilters` which
converts Catalyst predicates to Parquet filter predicates. In this way we can
support nested AND/OR/NOT predicates without changing current `BaseScan` type
hierarchy.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4623)
<!-- Reviewable:end -->
Author: Cheng Lian <[email protected]>
This patch had conflicts when merged, resolved by
Committer: Michael Armbrust <[email protected]>
Closes #4623 from liancheng/more-fiters and squashes the following commits:
1b296f4 [Cheng Lian] Add more filter types for data sources API
commit 864d77e0d23b974943a1875b7372de05b3595bd5
Author: Cheng Lian <[email protected]>
Date: 2015-02-16T20:52:05Z
[SPARK-5833] [SQL] Adds REFRESH TABLE command
Lifts `HiveMetastoreCatalog.refreshTable` to `Catalog`. Adds `RefreshTable`
command to refresh (possibly cached) metadata in external data sources tables.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4624)
<!-- Reviewable:end -->
Author: Cheng Lian <[email protected]>
Closes #4624 from liancheng/refresh-table and squashes the following
commits:
8d1aa4c [Cheng Lian] Adds REFRESH TABLE command
(cherry picked from commit c51ab37faddf4ede23243058dfb388e74a192552)
Signed-off-by: Michael Armbrust <[email protected]>
commit dd977dfed4303825fd2d5da036fcfd53820aefd8
Author: Matt Whelan <[email protected]>
Date: 2015-02-16T22:54:32Z
SPARK-5841: remove DiskBlockManager shutdown hook on stop
After a call to stop, the shutdown hook is redundant, and causes a
memory leak.
Author: Matt Whelan <[email protected]>
Closes #4627 from MattWhelan/SPARK-5841 and squashes the following commits:
d5f5c7f [Matt Whelan] SPARK-5841: remove DiskBlockManager shutdown hook on
stop
(cherry picked from commit bb05982dd25e008fb01684dff1f95d03e7271721)
Signed-off-by: Sean Owen <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]