Hi, It says that your command returns non-zero code. Does it return it in case you invoke it manually outside of Pig? I think I don't have any valuable ideas otherwise.
Thanks On Mon, Sep 30, 2013 at 10:37 AM, Anastasis Andronidis < andronat_...@hotmail.com> wrote: > Hello again, > > any comments on this? > > Thanks, > Anastasis > > On 27 Σεπ 2013, at 5:36 μ.μ., Anastasis Andronidis < > andronat_...@hotmail.com> wrote: > > > Hello, > > > > I am working on a very small project for my university and I have a > small cluster with 2 worker nodes and 1 master node. I'm using Pig to do > some calculations and I have a question regarding small files. > > > > I have a UDF that is reading a small input (around 200k) and correlates > the data from HDFS. My first approach was to upload the small file onto > HDFS and later, by using getCacheFiles(), access it into my UDF. > > > > After though, I needed to change things in this small file and this > meant to delete the file on HDFS, re-upload it and re-run Pig. But in the > end I need to change this small file frequently and I wanted to bypass HDFS > (because all those read + write + read in pig again is very very slow for > multiple iterations of my script), so what I did was: > > > > === pig script === > > %declare MYFILE `cat myfile.txt | awk 'BEGIN {ORS="|"; RS="\r\n"} > {print $0}'` > > > > .... MyUDF( line, '$MYFILE') ..... > > > > In the beginning, it worked great. But later (when my file started to > get larger of 100KB) on pig was stacking and I had to kill it: > > > > 2013-09-27 16:14:47,722 [main] INFO > org.apache.pig.tools.parameters.PreprocessorContext - Executing command : > cat myfile.txt | awk 'BEGIN {ORS="|"; RS="\r\n"} {print $0}' > > ^C2013-09-27 16:15:28,102 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. Error executing shell command: cat myfile.txt | > awk 'BEGIN {ORS="|"; RS="\r\n"} {print $0}'. Command exit with exit code of > 130 > > > > (btw is this a bug or something? should hung like that?) > > > > How can I manage small files in such cases so I don't need to re upload > everything in HDFS every time and make my iteration faster? > > > > Thanks, > > Anastasis > >