Hi,
I have Spark streaming app(1m batch) writing parquet data to a partition
e.g.
val hdfsPath = s"$dbPath/$tableName/year=$year/month=$month/day=$day"

df.write.mode(SaveMode.Append).parquet(hdfsPath)

I wonder would I lose data if I overwrite this partition with Hive
(compaction/deduplication) while Spark is adding more data to it every
minute. (hive query can take > 2 minutes)

Thanks,
Artur Sukhenko

Reply via email to