Re: DFS temporary files?

2008-09-08 Thread Steve Loughran

Owen O'Malley wrote:

Currently there isn't a way to do that. In Hadoop 0.19, there will be a way
to have a clean up method that runs at the end of the job. See
HADOOP-3150
.



another bit of feature creep would be an expires: attribute on files, 
and something to purge expired files every so often. Which ensures that 
even if a job dies or the entire cluster is reset, stuff gets cleaned up


Before someone rushes to implement this, I've been burned in the past by 
differences in a clusters machines and clocks. Even if everything really 
is in sync with NTP, and not configured to talk to a NTP server that the 
production site can't see, you still need to be 100% that all your boxes 
are in the same time zone.


-steve


Re: DFS temporary files?

2008-09-05 Thread Owen O'Malley
Currently there isn't a way to do that. In Hadoop 0.19, there will be a way
to have a clean up method that runs at the end of the job. See
HADOOP-3150
.

-- Owen


DFS temporary files?

2008-09-05 Thread James Moore
I've got some temporary files on DFS that get used by the
DistributedCache mechanism (they're zipped JRuby files).  Once the job
is done, they can be deleted.  Is there a way to tell Hadoop that?
Right now I'm just deleting them myself at the end of the job, but
that code isn't guaranteed to execute.

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com