Hi Fokko
Spark fires it off for many other things. It does so for ML pipelines and
it does make information available for data frames.
We use S3 in this case I just simplified the example. It is important to
know what process took what action. Only spark knows this and it does
supply this
Hi Bolke,
I would argue that Spark is not the right level of abstraction of doing
this. I would create a wrapper around the particular filesystem:
http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html
Therefore you can write a wrapper around the LocalFileSystem if data
Hi,
Apologies upfront if this should have gone to user@ but it seems a developer
question so here goes.
We are trying to improve a listener to track lineage across our platform. This
requires tracking where data comes from and where it goes to. E.g.
sc.setLogLevel("INFO");
val data =