Base Docker image caching broken in CI

2023-01-10 Thread Hyukjin Kwon
Hi all,

ghcr is flaky now, so we will have to wait for a couple of days and see if
it gets fixed up soon.
See also https://github.com/apache/spark/pull/39490#issuecomment-1378190658
Thanks Yikun for taking a look at this.


[DISCUSS] Deprecate DStream in 3.4

2023-01-10 Thread Jungtaek Lim
Hi dev,

I'd like to propose the deprecation of DStream in Spark 3.4, in favor of
promoting Structured Streaming.
(Sorry for the late proposal, if we don't make the change in 3.4, we will
have to wait for another 6 months.)

We have been focusing on Structured Streaming for years (across multiple
major and minor versions), and during the time we haven't made any
improvements for DStream. Furthermore, recently we updated the DStream doc
to explicitly say DStream is a legacy project.
https://spark.apache.org/docs/latest/streaming-programming-guide.html#note

The baseline of deprecation is that we don't see a particular use case
which only DStream solves. This is a different story with GraphX and MLLIB,
as we don't have replacements for that.

The proposal does not mean we will remove the API soon, as the Spark
project has been making deprecation against public API. I don't intend to
propose the target version for removal. The goal is to guide users to
refrain from constructing a new workload with DStream. We might want to go
with this in future, but it would require a new discussion thread at that
time.

What do you think?

Thanks,
Jungtaek Lim (HeartSaVioR)


Need help with merging PR to apache/spark

2023-01-10 Thread Anton Ippolitov
Hi everyone!

I opened a PR (https://github.com/apache/spark/pull/38376) to
fix SPARK-40817, it's been approved a while ago but not merged. Could I get
some help from someone with enough rights to merge this PR please?

Thanks a lot!
Best,
Anton