Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-10 Thread German Schiavon
OK got it! Thanks! On Tue, 9 Mar 2021 at 21:17, Jungtaek Lim wrote: > That property decides how many log files (log file is created per batch > per type - types are like offsets, commits, etc.) to retain on the > checkpoint. > > Unless you're struggling with a small files problem on

[no subject]

2021-03-10 Thread rahul c
Unsubscribe

Re: Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Attila Zsolt Piros
Hi Nick! I am not sure you are fixing a problem here. I think what you see is as problem is actually an intended behaviour. Checkpoint data should outlive the unexpected shutdowns. So there is a very important difference between the reference goes out of scope during a normal execution (in this

Re: Apache Spark 3.2 Expectation

2021-03-10 Thread Dongjoon Hyun
Hi, Xiao. This thread started 13 days ago. Since you asked the community about major features or timelines at that time, could you share your roadmap or expectations if you have something in your mind? > Thank you, Dongjoon, for initiating this discussion. Let us keep it open. It might take 1-2

Re: [Spark Streaming] [DISCUSS] Clear metadata method and Generate Batches using same Event Loop

2021-03-10 Thread Karthikeyan Ravi
Bring it up. On Mon, Feb 8, 2021 at 12:19 PM Karthikeyan Ravi wrote: > Hello, > > Our system observed this behaviour of Batches getting delayed for > generation in spark streaming and thereby creating a very big batch and > followed by few zero record batches. I read the code and added logs to

Re: Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Nicholas Chammas
Checkpoint data is left behind after a normal shutdown, not just an unexpected shutdown. The PR description includes a simple demonstration of this. If the current behavior is truly intended -- which I find difficult to believe given how confusing it

Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Nicholas Chammas
Hello people, I'm working on a fix for SPARK-33000 . Spark does not cleanup checkpointed RDDs/DataFrames on shutdown, even if the appropriate configs are set. In the course of developing a fix, another contributor pointed out

Re: Apache Spark 3.2 Expectation

2021-03-10 Thread Xiao Li
Below are some nice-to-have features we can work on in Spark 3.2: Lateral Join support , interval data type, timestamp without time zone, un-nesting arbitrary queries, the returned metrics of DSV2, and error message standardization. Spark 3.2 will

Re: Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Attila Zsolt Piros
> Checkpoint data is left behind after a normal shutdown, not just an unexpected shutdown. The PR description includes a simple demonstration of this. I think I might overemphasized a bit the "unexpected" adjective to show you the value in the current behavior. The feature configured with

Re: [build system] github fetches timing out

2021-03-10 Thread shane knapp ☠
...and just like that, overnight the builds started successfully git fetching! On Tue, Mar 9, 2021 at 12:31 PM shane knapp ☠ wrote: > it looks like over the past few days the master/branch builds have been > timing out... this hasn't happened in a few years, and honestly the last > times this

Re: [build system] github fetches timing out

2021-03-10 Thread Liang-Chi Hsieh
-- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [build system] github fetches timing out

2021-03-10 Thread Liang-Chi Hsieh
Thanks Shane for looking at it! shane knapp ☠ wrote > ...and just like that, overnight the builds started successfully git > fetching! > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu -- Sent