First I would like to thank everyone on the quick response and fixes on most issues. Great job everyone!
I noticed that using cache() on PTable built using SparkPipeline seems to reuse object for downstream DoFn's. Here is an example that exhibits this behavior https://gist.github.com/nasokan/531b4ff9bf827d0835ab I would expect the output of this program to create a pair with same key, value. However, this produces Pair with different key value. I have tested this with text file input source and it works as expected. Removing cache() also produces expected result. So I'm suspecting this issue to be specific to avro and cache(). Any thoughts on this behavior? Thank you! Nithin
