This would be helpful for a few use cases. For context my team works in
security space, and customers access data through a wrapper around spark
sql connected to hive metastore.
1. When snapshot (non-partitioned) tables are queried, it’s not clear when
the underlying snapshot was last updated. hav
Sharing an example since a few people asked me off-list:
We have stored the partition details in the read/write nodes of the
physical plan.
So this can be accessed via the plan like plan.getInputPartitions or
plan.getOutputPartitions, which internally loops through the nodes in the
plan and collec
Hello Spark Devs!
We are from Uber's Spark team.
Our ETL jobs use Spark to read and write from Hive datasets stored in HDFS.
The freshness of the partition written to depends on the freshness of the
data in the input partition(s). We monitor this freshness score, so that
partitions in our criti