According to https://issues.apache.org/jira/browse/PIG-1752 :
"One other note. I didn't include any unit tests with this patch. I don't know how to test it in the unit tests since the distributed cache isn't used in local mode. I've tested it on a cluster. Any thoughts on how to include tests for this in the unit tests are welcome." getcacheFiles does not work with local mode. This is problematic. How do I write a UDF that works in both local mode and hadoop mode? On Mon, Jan 6, 2014 at 12:08 PM, Russell Jurney <[email protected]>wrote: > Question: in local mode, can the path given to getCacheFiles() be on the > local filesystem? Or does it have to be on HDFS? > > > On Mon, Jan 6, 2014 at 11:29 AM, Russell Jurney > <[email protected]>wrote: > >> 1. I've also given it an absolute local path. I don't know what you mean >> by an absolute cache path. How do I know what that is? The examples use >> ./link to access the cached file. >> 2. Because all examples do so. What paths should I use to access the >> distributed cache from inside exec? >> >> Exception does say that passed is missing. But as I read the examples, it >> should be there. >> >> On Monday, January 6, 2014, Serega Sheypak wrote: >> >>> Yes it works. Exception clearly says that ./passwd is missing. >>> 1. Try to give absolute path to file, see if it works. It should. >>> 2. Then give relative path. Looks like you incorrectly provide relative >>> path. why do you put "./" before filename? >>> >>> >>> 2014/1/6 Russell Jurney <[email protected]> >>> >>> > I have implemented to class below to test the udf cache, and it fails >>> in >>> > local mode with the error below. That cache should work in local mode >>> as >>> > well, right? >>> > >>> > ------------ >>> > >>> > org.apache.pig.backend.executionengine.ExecException: ERROR 2078: >>> Caught >>> > error from UDF: datafu.pig.text.Udfcachetest [./passwd (No such file or >>> > directory)] >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:358) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:315) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) >>> > >>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>> > >>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >>> > >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> > >>> > at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >>> > >>> > Caused by: java.io.FileNotFoundException: ./passwd (No such file or >>> > directory) >>> > >>> > at java.io.FileInputStream.open(Native Method) >>> > >>> > at java.io.FileInputStream.<init>(FileInputStream.java:146) >>> > >>> > at java.io.FileInputStream.<init>(FileInputStream.java:101) >>> > >>> > at java.io.FileReader.<init>(FileReader.java:58) >>> > >>> > at datafu.pig.text.Udfcachetest.exec(Udfcachetest.java:22) >>> > >>> > at datafu.pig.text.Udfcachetest.exec(Udfcachetest.java:19) >>> > >>> > at >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330) >>> > ----------------------- >>> > >>> > package datafu.pig.text; >>> > >>> > import org.apache.pig.EvalFunc; >>> > import org.apache.pig.data.Tuple; >>> > >>> > import java.io.BufferedReader; >>> > import java.io.FileReader; >>> > import java.io.IOException; >>> > import java.util.ArrayList; >>> > import java.util.List; >>> > >>> > /** >>> > * Created with IntelliJ IDEA. >>> > * User: rjurney >>> > * Date: 1/5/14 >>> > * Time: 8:32 PM >>> > * To change this template use File | Settings | File Templates. >>> > */ >>> > public class Udfcachetest extends EvalFunc<String> { >>> > >>> > public String exec(Tuple input) throws IOException { >>> > FileReader fr = new FileReader("./passwd"); >>> > BufferedReader d = new BufferedReader(fr); >>> > return d.readLine(); >>> > } >>> > >>> > public List<String> getCacheFiles() { >>> > List<String> list = new ArrayList<String>(1); >>> > list.add("/etc/passwd"); >>> > return list; >>> > } >>> > } >>> > >>> > -- >>> > Russell Jurney twitter.com/rjurney [email protected] >>> > datasyndrome.com >>> > >>> >> >> >> -- >> Russell Jurney twitter.com/rjurney [email protected] datasyndrome. >> com >> > > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome. > com > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
