Hello, We are hoping someone can help us understand the spark behavior for scenarios listed below.
Q. *Will spark running queries fail when S3 parquet object changes underneath with S3A remote file change detection enabled? Is it 100%? * Our understanding is that S3A has a feature for remote file change detection using ETag, implemented in the S3AInputStream class. This feature caches the ETag per S3AInputStream Instance and uses it to detect file changes even if the stream is reopened. When running a Spark query that uses FSDataInputStream, will it reliably detect changes in the file on S3? *Q2. Does spark work on a single instance of S3AInputStream for a parquet file or can open multiple S3AInputStream for some queries? * Thanks Raghav