subject:"Structured Streaming Initial Listing Issue"

Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread Anastasiia Sokhova

Subject: Re: Structured Streaming Initial Listing Issue You don't often get email from andrewlopuk...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hello. AFAIK the problem is not scoped to streaming and can't be mitigated with only maxRed

Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread Andrei L

Hello. AFAIK the problem is not scoped to streaming and can't be mitigated with only maxRedultSize for such input. Spark has to bring all file paths into driver memory even in case of streaming ( https://github.com/apache/spark/blob/37028fafc4f9fc873195a88f0840ab69edcf9d2b/sql/core/src/main/scala

Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread 刘唯

That 1073.3 MiB isn't too much bigger than spark.driver.maxResultSize, can't you just increase that config with a larger number? / Wei Anastasiia Sokhova 于2025年4月16日周三 03:37写道： > Dear Spark Community, > > > > I run a Structured Streaming Query to read json files from S3 into an Ic > eberg table

Structured Streaming Initial Listing Issue

2025-04-16 Thread Anastasiia Sokhova

Dear Spark Community, I run a Structured Streaming Query to read json files from S3 into an Iceberg table. This is my query: ```python stream_reader = ( spark_session.readStream.format("json") .schema(schema) .option("maxFilesPerTrigger", 256_000) .option("basePath", f"s3a://tes