Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread Anastasiia Sokhova
Subject: Re: Structured Streaming Initial Listing Issue You don't often get email from andrewlopuk...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hello. AFAIK the problem is not scoped to streaming and can't be mitigated with only maxRed

Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread Andrei L
Hello. AFAIK the problem is not scoped to streaming and can't be mitigated with only maxRedultSize for such input. Spark has to bring all file paths into driver memory even in case of streaming ( https://github.com/apache/spark/blob/37028fafc4f9fc873195a88f0840ab69edcf9d2b/sql/core/src/main/scala

Re: Structured Streaming Initial Listing Issue

2025-05-12 Thread 刘唯
That 1073.3 MiB isn't too much bigger than spark.driver.maxResultSize, can't you just increase that config with a larger number? / Wei Anastasiia Sokhova 于2025年4月16日周三 03:37写道: > Dear Spark Community, > > > > I run a Structured Streaming Query to read json files from S3 into an Ic > eberg table

Structured Streaming Initial Listing Issue

2025-04-16 Thread Anastasiia Sokhova
Dear Spark Community, I run a Structured Streaming Query to read json files from S3 into an Iceberg table. This is my query: ```python stream_reader = ( spark_session.readStream.format("json") .schema(schema) .option("maxFilesPerTrigger", 256_000) .option("basePath", f"s3a://tes