subject:"invalidate caching for hadoopFile input\?"

Re: invalidate caching for hadoopFile input?

2015-04-20 Thread ayan guha

You can use rdd.unpersist. its documented in spark programming guide page under Removing Data section. Ayan On 21 Apr 2015 13:16, Wei Wei vivie...@gmail.com wrote: Hey folks, I am trying to load a directory of avro files like this in spark-shell: val data =

invalidate caching for hadoopFile input?

2015-04-20 Thread Wei Wei

Hey folks, I am trying to load a directory of avro files like this in spark-shell: val data = sqlContext.avroFile(hdfs://path/to/dir/*).cache data.count This works fine, but when more files are uploaded to that directory running these two lines again yields the same result. I suspect there is