GitHub user sujan121 opened a pull request:
https://github.com/apache/spark/pull/14810
Branch 1.6
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14810.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14810
----
commit 7482c7b5aba5b649510bbb8886bbf2b44f86f543
Author: Shixiong Zhu <[email protected]>
Date: 2016-01-18T23:38:03Z
[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume
integration doc
This PR added instructions to get flume assembly jar for Python users in
the flume integration page like Kafka doc.
Author: Shixiong Zhu <[email protected]>
Closes #10746 from zsxwing/flume-doc.
(cherry picked from commit a973f483f6b819ed4ecac27ff5c064ea13a8dd71)
Signed-off-by: Tathagata Das <[email protected]>
commit d43704d7fc6a5e9da4968b1dafa8d4b1c341ee8d
Author: Shixiong Zhu <[email protected]>
Date: 2016-01-19T00:50:05Z
[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis
integration doc
This PR added instructions to get Kinesis assembly jar for Python users in
the Kinesis integration page like Kafka doc.
Author: Shixiong Zhu <[email protected]>
Closes #10822 from zsxwing/kinesis-doc.
(cherry picked from commit 721845c1b64fd6e3b911bd77c94e01dc4e5fd102)
Signed-off-by: Tathagata Das <[email protected]>
commit 68265ac23e20305474daef14bbcf874308ca8f5a
Author: Wenchen Fan <[email protected]>
Date: 2016-01-19T05:20:19Z
[SPARK-12841][SQL][BRANCH-1.6] fix cast in filter
In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better
alias if possible. However, for cases like filter, the `UnresolvedAlias` can't
be resolved and actually we don't need a better alias for this case. This PR
move the cast wrapping logic to `Column.named` so that we will only do it when
we need a alias name.
backport https://github.com/apache/spark/pull/10781 to 1.6
Author: Wenchen Fan <[email protected]>
Closes #10819 from cloud-fan/bug.
commit 30f55e5232d85fd070892444367d2bb386dfce13
Author: proflin <[email protected]>
Date: 2016-01-19T08:15:43Z
[SQL][MINOR] Fix one little mismatched comment according to the codes in
interface.scala
Author: proflin <[email protected]>
Closes #10824 from proflin/master.
(cherry picked from commit c00744e60f77edb238aff1e30b450dca65451e91)
Signed-off-by: Reynold Xin <[email protected]>
commit 962e618ec159f8cd26543f42b2ce484fd5a5d8c5
Author: Wojciech Jurczyk <[email protected]>
Date: 2016-01-19T09:36:45Z
[MLLIB] Fix CholeskyDecomposition assertion's message
Change assertion's message so it's consistent with the code. The old
message says that the invoked method was lapack.dports, where in fact it was
lapack.dppsv method.
Author: Wojciech Jurczyk <[email protected]>
Closes #10818 from wjur/wjur/rename_error_message.
(cherry picked from commit ebd9ce0f1f55f7d2d3bd3b92c4b0a495c51ac6fd)
Signed-off-by: Sean Owen <[email protected]>
commit 40fa21856aded0e8b0852cdc2d8f8bc577891908
Author: Josh Rosen <[email protected]>
Date: 2016-01-21T00:10:28Z
[SPARK-12921] Use SparkHadoopUtil reflection in
SpecificParquetRecordReaderBase
It looks like there's one place left in the codebase,
SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's
reflective accesses of TaskAttemptContext methods, which could create problems
when using a single Spark artifact with both Hadoop 1.x and 2.x.
Author: Josh Rosen <[email protected]>
Closes #10843 from JoshRosen/SPARK-12921.
commit b5d7dbeb3110a11716f6642829f4ea14868ccc8a
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-01-22T02:55:28Z
[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array
https://issues.apache.org/jira/browse/SPARK-12747
Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real".
Author: Liang-Chi Hsieh <[email protected]>
Closes #10695 from viirya/fix-postgres-jdbc.
(cherry picked from commit 55c7dd031b8a58976922e469626469aa4aff1391)
Signed-off-by: Reynold Xin <[email protected]>
commit dca238af7ef39e0d1951b72819f12092eae1964a
Author: Alex Bozarth <[email protected]>
Date: 2016-01-23T11:19:58Z
[SPARK-12859][STREAMING][WEB UI] Names of input streams with receivers
don't fit in Streaming page
Added CSS style to force names of input streams with receivers to wrap
Author: Alex Bozarth <[email protected]>
Closes #10873 from ajbozarth/spark12859.
(cherry picked from commit 358a33bbff549826b2336c317afc7274bdd30fdb)
Signed-off-by: Kousuke Saruta <[email protected]>
commit e8ae242f925ab747aa5a7bba581da66195e31110
Author: Mortada Mehyar <[email protected]>
Date: 2016-01-23T11:36:33Z
[SPARK-12760][DOCS] invalid lambda expression in python example for â¦
â¦local vs cluster
srowen thanks for the PR at https://github.com/apache/spark/pull/10866!
sorry it took me a while.
This is related to https://github.com/apache/spark/pull/10866, basically
the assignment in the lambda expression in the python example is actually
invalid
```
In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
File "<ipython-input-4-fcb86c182bad>", line 1
rdd.foreach(lambda x: counter += x)
^
SyntaxError: invalid syntax
```
Author: Mortada Mehyar <[email protected]>
Closes #10867 from mortada/doc_python_fix.
(cherry picked from commit 56f57f894eafeda48ce118eec16ecb88dbd1b9dc)
Signed-off-by: Sean Owen <[email protected]>
commit f13a3d1f73d01bf167f3736b66222b1cb8f7a01b
Author: Sean Owen <[email protected]>
Date: 2016-01-23T11:45:12Z
[SPARK-12760][DOCS] inaccurate description for difference between local vs
cluster mode in closure handling
Clarify that modifying a driver local variable won't have the desired
effect in cluster modes, and may or may not work as intended in local mode
Author: Sean Owen <[email protected]>
Closes #10866 from srowen/SPARK-12760.
(cherry picked from commit aca2a0165405b9eba27ac5e4739e36a618b96676)
Signed-off-by: Sean Owen <[email protected]>
commit f913f7ea080bc90bd967724e583f42b0a48075d9
Author: Jeff Zhang <[email protected]>
Date: 2016-01-24T20:29:26Z
[SPARK-12120][PYSPARK] Improve exception message when failing to initâ¦
â¦ialize HiveContext in PySpark
davies Mind to review ?
This is the error message after this PR
```
15/12/03 16:59:53 WARN ObjectStore: Failed to get database default,
returning NoSuchObjectException
/Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning:
You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
assembly
warnings.warn("You must build Spark with Hive. "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line
663, in read
return DataFrameReader(self)
File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line
56, in __init__
self._jreader = sqlContext._ssql_ctx.read()
File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line
692, in _ssql_ctx
raise e
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.net.ConnectException: Call From
jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at
org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at
org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
```
Author: Jeff Zhang <[email protected]>
Closes #10126 from zjffdu/SPARK-12120.
(cherry picked from commit e789b1d2c1eab6187f54424ed92697ca200c3101)
Signed-off-by: Josh Rosen <[email protected]>
commit 88614dd0f9f25ec2045940b030d757079913ac26
Author: Cheng Lian <[email protected]>
Date: 2016-01-25T03:40:34Z
[SPARK-12624][PYSPARK] Checks row length when converting Java arrays to
Python rows
When actual row length doesn't conform to specified schema field length, we
should give a better error message instead of throwing an unintuitive
`ArrayOutOfBoundsException`.
Author: Cheng Lian <[email protected]>
Closes #10886 from liancheng/spark-12624.
(cherry picked from commit 3327fd28170b549516fee1972dc6f4c32541591b)
Signed-off-by: Yin Huai <[email protected]>
commit 88114d3d87f41827ffa9f683edce5e85fdb724ff
Author: Andy Grove <[email protected]>
Date: 2016-01-25T09:22:10Z
[SPARK-12932][JAVA API] improved error message for java type inference
failure
Author: Andy Grove <[email protected]>
Closes #10865 from andygrove/SPARK-12932.
(cherry picked from commit d8e480521e362bc6bc5d8ebcea9b2d50f72a71b9)
Signed-off-by: Sean Owen <[email protected]>
commit b40e58cf251c22c6b0ba383cc7e67ef6b07d8ec5
Author: Michael Allman <[email protected]>
Date: 2016-01-25T09:51:41Z
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid
a race condition where the standalone master attempts to build the app's
history UI before the event log is stopped.
This contribution is my original work, and I license this work to the Spark
project under the project's open source license.
Author: Michael Allman <[email protected]>
Closes #10700 from mallman/stop_event_logger_first.
(cherry picked from commit 4ee8191e57cb823a23ceca17908af86e70354554)
Signed-off-by: Sean Owen <[email protected]>
commit 572bc399952bae322ed6909290996b103688fd3a
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-01-26T11:36:00Z
[SPARK-12961][CORE] Prevent snappy-java memory leak
JIRA: https://issues.apache.org/jira/browse/SPARK-12961
To prevent memory leak in snappy-java, just call the method once and cache
the result. After the library releases new version, we can remove this object.
JoshRosen
Author: Liang-Chi Hsieh <[email protected]>
Closes #10875 from viirya/prevent-snappy-memory-leak.
(cherry picked from commit 5936bf9fa85ccf7f0216145356140161c2801682)
Signed-off-by: Sean Owen <[email protected]>
commit f0c98a60f0b4982dc8e29b4a5d213fd8ce4abaf2
Author: Sameer Agarwal <[email protected]>
Date: 2016-01-26T15:50:37Z
[SPARK-12682][SQL] Add support for (optionally) not storing tables in hive
metadata format
This PR adds a new table option (`skip_hive_metadata`) that'd allow the
user to skip storing the table metadata in hive metadata format. While this
could be useful in general, the specific use-case for this change is that Hive
doesn't handle wide schemas well (see
https://issues.apache.org/jira/browse/SPARK-12682 and
https://issues.apache.org/jira/browse/SPARK-6024) which in turn prevents such
tables from being queried in SparkSQL.
Author: Sameer Agarwal <[email protected]>
Closes #10826 from sameeragarwal/skip-hive-metadata.
(cherry picked from commit 08c781ca672820be9ba32838bbe40d2643c4bde4)
Signed-off-by: Yin Huai <[email protected]>
commit 6ce3dd940def9257982d556cd3adf307fc2fe8a4
Author: Yin Huai <[email protected]>
Date: 2016-01-26T16:34:10Z
[SPARK-12682][SQL][HOT-FIX] Fix test compilation
Author: Yin Huai <[email protected]>
Closes #10925 from yhuai/branch-1.6-hot-fix.
commit 85518eda459a48c72a629b4cb9994fc753f72a58
Author: Holden Karau <[email protected]>
Date: 2016-01-04T01:04:35Z
[SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local
Previously (when the PR was first created) not specifying b= explicitly was
fine (and treated as default null) - instead be explicit about b being None in
the test.
Author: Holden Karau <[email protected]>
Closes #10564 from holdenk/SPARK-12611-fix-test-infer-schema-local.
(cherry picked from commit 13dab9c3862cc454094cd9ba7b4504a2d095028f)
Signed-off-by: Yin Huai <[email protected]>
commit 17d1071ce8945d056da145f64797d1d10529afc1
Author: Xusen Yin <[email protected]>
Date: 2016-01-27T08:32:52Z
[SPARK-12834][ML][PYTHON][BACKPORT] Change ser/de of JavaArray and JavaList
Backport of SPARK-12834 for branch-1.6
Original PR: https://github.com/apache/spark/pull/10772
Original commit message:
We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in
`PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side.
However, there is no need to transform them in such an inefficient way. Instead
of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or
`list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I
said in https://issues.apache.org/jira/browse/SPARK-12780
Author: Xusen Yin <[email protected]>
Closes #10941 from jkbradley/yinxusen-SPARK-12834-1.6.
commit 96e32db5cbd1ef32f65206357bfb8d9f70a06d0a
Author: Jason Lee <[email protected]>
Date: 2016-01-27T17:55:10Z
[SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with
`None` triggers cryptic failure
The error message is now changed from "Do not support type class
scala.Tuple2." to "Do not support type class org.json4s.JsonAST$JNull$" to be
more informative about what is not supported. Also, StructType metadata now
handles JNull correctly, i.e., {'a': None}. test_metadata_null is added to
tests.py to show the fix works.
Author: Jason Lee <[email protected]>
Closes #8969 from jasoncl/SPARK-10847.
(cherry picked from commit edd473751b59b55fa3daede5ed7bc19ea8bd7170)
Signed-off-by: Yin Huai <[email protected]>
commit 84dab7260e9a33586ad4002cd826a5ae7c8c4141
Author: Shixiong Zhu <[email protected]>
Date: 2016-01-29T21:53:11Z
[SPARK-13082][PYSPARK] Backport the fix of 'read.json(rdd)' in #10559 to
branch-1.6
SPARK-13082 actually fixed by #10559. However, it's a big PR and not
backported to 1.6. This PR just backported the fix of 'read.json(rdd)' to
branch-1.6.
Author: Shixiong Zhu <[email protected]>
Closes #10988 from zsxwing/json-rdd.
commit bb01cbe9b2c0f64eef34f6a59b5bf7be55c73012
Author: Andrew Or <[email protected]>
Date: 2016-01-30T02:00:49Z
[SPARK-13088] Fix DAG viz in latest version of chrome
Apparently chrome removed `SVGElement.prototype.getTransformToElement`,
which is used by our JS library dagre-d3 when creating edges. The real diff can
be found here:
https://github.com/andrewor14/dagre-d3/commit/7d6c0002e4c74b82a02c5917876576f71e215590,
which is taken from the fix in the main repo:
https://github.com/cpettitt/dagre-d3/commit/1ef067f1c6ad2e0980f6f0ca471bce998784b7b2
Upstream issue: https://github.com/cpettitt/dagre-d3/issues/202
Author: Andrew Or <[email protected]>
Closes #10986 from andrewor14/fix-dag-viz.
(cherry picked from commit 70e69fc4dd619654f5d24b8b84f6a94f7705c59b)
Signed-off-by: Andrew Or <[email protected]>
commit ddb9633043e82fb2a34c7e0e29b487f635c3c744
Author: Kevin Yu <[email protected]>
Date: 2015-12-28T19:58:33Z
[SPARK-12231][SQL] create a combineFilters' projection when we call
buildPartitionedTableScan
Hello Michael & All:
We have some issues to submit the new codes in the other PR(#10299), so we
closed that PR and open this one with the fix.
The reason for the previous failure is that the projection for the scan
when there is a filter that is not pushed down (the "left-over" filter) could
be different, in elements or ordering, from the original projection.
With this new codes, the approach to solve this problem is:
Insert a new Project if the "left-over" filter is nonempty and (the
original projection is not empty and the projection for the scan has more than
one elements which could otherwise cause different ordering in projection).
We create 3 test cases to cover the otherwise failure cases.
Author: Kevin Yu <[email protected]>
Closes #10388 from kevinyu98/spark-12231.
(cherry picked from commit fd50df413fbb3b7528cdff311cc040a6212340b9)
Signed-off-by: Cheng Lian <[email protected]>
commit 9a5b25d0f8543e24b4d00497399790930c01246f
Author: gatorsmile <[email protected]>
Date: 2016-02-01T19:22:02Z
[SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions
JIRA: https://issues.apache.org/jira/browse/SPARK-12989
In the rule `ExtractWindowExpressions`, we simply replace alias by the
corresponding attribute. However, this will cause an issue exposed by the
following case:
```scala
val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C",
"num")
.withColumn("Data", struct("A", "B", "C"))
.drop("A")
.drop("B")
.drop("C")
val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
data.select($"*", max("num").over(winSpec) as "max").explain(true)
```
In this case, both `Data.A` and `Data.B` are `alias` in
`WindowSpecDefinition`. If we replace these alias expression by their alias
names, we are unable to know what they are since they will not be put in
`missingExpr` too.
Author: gatorsmile <[email protected]>
Author: xiaoli <[email protected]>
Author: Xiao Li <[email protected]>
Closes #10963 from gatorsmile/seletStarAfterColDrop.
(cherry picked from commit 33c8a490f7f64320c53530a57bd8d34916e3607c)
Signed-off-by: Michael Armbrust <[email protected]>
commit 215d5d8845b6e52d75522e1c0766d324d11e4d42
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-02-01T20:02:06Z
[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md
ISTM `lib` is better because `datanucleus` jars are located in `lib` for
release builds.
Author: Takeshi YAMAMURO <[email protected]>
Closes #10901 from maropu/DocFix.
(cherry picked from commit da9146c91a33577ff81378ca7e7c38a4b1917876)
Signed-off-by: Michael Armbrust <[email protected]>
commit 70fcbf68e412f6549ba6c2db86f7ef4518d05fe1
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-02-01T20:13:17Z
[SPARK-11780][SQL] Add catalyst type aliases backwards compatibility
Changed a target at branch-1.6 from #10635.
Author: Takeshi YAMAMURO <[email protected]>
Closes #10915 from maropu/pr9935-v3.
commit bd8efba8f2131d951829020b4c68309a174859cf
Author: Michael Armbrust <[email protected]>
Date: 2016-02-02T08:51:07Z
[SPARK-13087][SQL] Fix group by function for sort based aggregation
It is not valid to call `toAttribute` on a `NamedExpression` unless we know
for sure that the child produced that `NamedExpression`. The current code
worked fine when the grouping expressions were simple, but when they were a
derived value this blew up at execution time.
Author: Michael Armbrust <[email protected]>
Closes #11011 from marmbrus/groupByFunction.
commit 99594b213c941cd3ffa3a034f007e44efebdb545
Author: Michael Armbrust <[email protected]>
Date: 2016-02-02T18:15:40Z
[SPARK-13094][SQL] Add encoders for seq/array of primitives
Author: Michael Armbrust <[email protected]>
Closes #11014 from marmbrus/seqEncoders.
(cherry picked from commit 29d92181d0c49988c387d34e4a71b1afe02c29e2)
Signed-off-by: Michael Armbrust <[email protected]>
commit 9a3d1bd09cdf4a7c2992525c203d4dac764fddb8
Author: Xusen Yin <[email protected]>
Date: 2016-02-02T18:21:21Z
[SPARK-12780][ML][PYTHON][BACKPORT] Inconsistency returning value of ML
python models' properties
Backport of [SPARK-12780] for branch-1.6
Original PR for master: https://github.com/apache/spark/pull/10724
This fixes StringIndexerModel.labels in pyspark.
Author: Xusen Yin <[email protected]>
Closes #10950 from jkbradley/yinxusen-spark-12780-backport.
commit 53f518a6e2791cc4967793b6cc0d4a68d579cb33
Author: Narine Kokhlikyan <[email protected]>
Date: 2016-01-22T18:35:02Z
[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method
I've tried to solve some of the issues mentioned in:
https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!
Author: Narine Kokhlikyan <[email protected]>
Closes #10580 from NarineK/sparkrSavaAsRable.
(cherry picked from commit 8a88e121283472c26e70563a4e04c109e9b183b3)
Signed-off-by: Shivaram Venkataraman <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]