Hi, allI wonder how to delete hdfs file/directory using spark API?
There is no direct way of doing it, but you can do something like this:
val hadoopConf = ssc.sparkContext.hadoopConfiguration
var hdfs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
tmp_stream = ssc.textFileStream(/akhld/sigmoid/) // each line will have
hdfs location to be deleted.
You can use Hadoop Client Api to remove files
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#delete(org.apache.hadoop.fs.Path,
boolean). I don't think spark has any wrapper on hadoop filesystem APIs.
On Thu, Jan 22, 2015 at 12:15 PM, LinQili lin_q...@outlook.com