Hello, I've a question regarding a use case. I have an ETL using spark and working great. I use cephFS mounted on all spark node to store data. However one problem I have is that b2zipping + transfer from source to spark storage is really long. I would like to be able to process the file as it's written by chunk of 100MB. Is there something like that possible in Spark or do I need to use spark streaming, and if using spark streaming would it mean my application would need to run as a daemon on the spark node ?
Thank you for your help and ideas. Antoine
smime.p7s
Description: S/MIME Cryptographic Signature