Hello fellow pig users,

I have told pig to use a separate disk for its temp files by setting
PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its
files in /tmp:

/tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp82247880/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
-rw-r--r-- 1 pig pig 8 2010-12-16 14:13 ./tmp-1431528563/.part-00000.crc
-rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
-rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01 ./tmp1746442640/.part-00000.crc
-rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
-rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
-rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
-rw-r--r-- 1 pig pig 38M 2010-12-16 14:13 ./tmp1280814018/.part-00000.crc
-rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp1738480876/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000

I don't know what these files are and my google-fu is too weak to find
anything.

FWIW, the command line I currently use to run pig is

pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
scripts/the_script.pig

I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
Preferrably, it should be a command line argument or an environment variable
and not tweeking an xml file.  Not only will that make my scripts more
transparent, but the xml file I've heard about so far (hadoop-site.xml)
resides within the hadoop jar which is pre-built, and I'd rather avoid
cracking it open in order to modify its contents.  Preferred solution aside,
I'm glad for any help!

Thanks in advance,

David

-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00

Reply via email to