[ANNOUNCE] Announcing Apache Spark 2.3.4

2019-09-09 Thread Kazuaki Ishizaki
We are happy to announce the availability of Spark 2.3.4!

Spark 2.3.4 is a maintenance release containing stability fixes. This
release is based on the branch-2.3 maintenance branch of Spark. We 
strongly
recommend all 2.3.x users to upgrade to this stable release.

To download Spark 2.3.4, head over to the download page:
http://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-2-3-4.html

We would like to acknowledge all community members for contributing to 
this
release. This release would not have been possible without you.

Kazuaki Ishizaki



Problem upgrading from 2.3.1 to 2.4.3 with gradle

2019-09-09 Thread Nathan Kronenfeld
Hi, Spark community.

We are trying to upgrade our application from spark 2.3.1 to 2.4.3, and
came across a weird problem.

We are using Gradle for dependency management.

Spark depends on twitter-chill, which depends on kryo-shaded. All our
dependencies to kryo-shaded come from twitter, and all request version 4.0.2

Gradle, however, in its infinite wisdom, pulls version 3.0.3 instead, and
that won't even compile.

dependencyInsight returns the following:

com.esotericsoftware:kryo-shaded:3.0.3 (selected by rule)
   variant "runtime" [
  org.gradle.status = release (not requested)
  Requested attributes not found in the selected variant:
 org.gradle.usage  = java-api
   ]

com.esotericsoftware:kryo-shaded:4.0.2 -> 3.0.3
+--- com.twitter:chill-java:0.9.3
|+--- org.apache.spark:spark-core_2.11:2.4.3
||+--- compileClasspath
||+--- org.apache.spark:spark-mllib_2.11:2.4.3
|||\--- compileClasspath
||+--- org.apache.spark:spark-sql_2.11:2.4.3
|||+--- compileClasspath
|||\--- org.apache.spark:spark-mllib_2.11:2.4.3 (*)
||+--- org.apache.spark:spark-catalyst_2.11:2.4.3
|||\--- org.apache.spark:spark-sql_2.11:2.4.3 (*)
||+--- org.apache.spark:spark-streaming_2.11:2.4.3
|||\--- org.apache.spark:spark-mllib_2.11:2.4.3 (*)
||\--- org.apache.spark:spark-graphx_2.11:2.4.3
|| \--- org.apache.spark:spark-mllib_2.11:2.4.3 (*)
|\--- com.twitter:chill_2.11:0.9.3
| +--- org.apache.spark:spark-core_2.11:2.4.3 (*)
| \--- org.apache.spark:spark-unsafe_2.11:2.4.3
|  +--- org.apache.spark:spark-catalyst_2.11:2.4.3 (*)
|  \--- org.apache.spark:spark-core_2.11:2.4.3 (*)
\--- com.twitter:chill_2.11:0.9.3 (*)

I presume this means that Gradle can't find the property/value
org.gradle.usage=java-api in kryo-shaded version 4.0.2, but it can in 3.0.3?

Does anyone know why this might occur? I see no reference to
org.gradle.usage in either our or spark's build files, so (assuming I even
understand the problem correctly) I have no idea where this requirement is
coming from.

We can work around the problem by setting the kryo-shaded version
explicitly, but of course this means we would have to keep setting it as we
upgrade in the future, so of course this is not ideal.

I realize this is likely (though not certainly) a gradle, not a spark,
problem, but I'm hoping someone else here has encountered this before?

Thanks in advance,
-Nathan Kronenfeld


Re: read binary files (for stream reader) / spark 2.3

2019-09-09 Thread Peter Liu
Hello experts,

I have one additional question: how can I read binary files into a stream
reader object? (intended for getting data into a kafka server).

I looked into DataStreamReader API (
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-DataStreamReader.html#option)
and other google results and didn't find an option for binary file.

Any help would be very much appreciated!
(thanks again for Ilya's helpful information below - works fine on
sparkContext object)

Regards,

Peter


On Thu, Sep 5, 2019 at 3:09 PM Ilya Matiach  wrote:

> Hi Peter,
>
> You can use the spark.readImages API in spark 2.3 for reading images:
>
>
>
>
> https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html
>
>
> https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-data-support-in-apache-spark/
>
>
>
>
> https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$
>
>
>
> There’s also a spark package for spark versions older than 2.3:
>
> https://github.com/Microsoft/spark-images
>
>
>
> Thank you, Ilya
>
>
>
>
>
>
>
>
>
> *From:* Peter Liu 
> *Sent:* Thursday, September 5, 2019 2:13 PM
> *To:* dev ; User 
> *Subject:* Re: read image or binary files / spark 2.3
>
>
>
> Hello experts,
>
>
>
> I have quick question: which API allows me to read images files or binary
> files (for SparkSession.readStream) from a local/hadoop file system in
> Spark 2.3?
>
>
>
> I have been browsing the following documentations and googling for it and
> didn't find a good example/documentation:
>
>
>
> https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html
> 
>
>
> https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.package
> 
>
>
>
> any hint/help would be very much appreciated!
>
>
>
> thanks!
>
>
>
> Peter
>