Hi everyone,
I want to read each line in file test.txt -- it will be something like "'Markov 
   chain Monte Carlo (MCMC) methods are an important algorithmic'"  -- just 
individual sentences and then I will do some text processing on each sentence 
in my udf. I get back a score for each of these sentences.  I am including only 
the relevant snippet of my python script.  Any help with this will be great!


// Script snippet:params ={'infile':  '/data/data.in', 'outfile': 
'/results/scores/', 'sentence': '0' };
f=open('./test.txt')
for currentline in f:    print 'Current line is' + currentline;     
params["sentence"] = currentline;    params["outfile"] = '/results/scores/' + 
'algo.out' + str(i);   bound = P.bind(params);      i = i + 1;    if 
result.isSuccessful() :        print 'Pig job succeeded'    else :        raise 
'Pig job failed'
Does not work when the myudf.myfunc is parameterized -- $sentence.P = 
Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A 
GENERATE myudf.myfunc($sentence, line); STORE SCORES into '$outfile';")\
I get the following error:Current line is Markov    chain Monte Carlo (MCMC) 
methods are an   important algorithmic
2014-06-19 17:40:50,371 [main] INFO  org.apache.pig.scripting.BoundScript - 
Query to run:A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A 
GENERATE  myudf.myfunc(Markov, line); STORE SCORES into 
'/results/scores/algo.out0';
2014-06-19 17:40:51,100 [main] ERROR org.apache.pig.Main - ERROR 1025:<line 1, 
column 110> Invalid field projection. Projected field [Markov] does not exist 
in schema: line:chararray.Details at logfile: 
/apps/software/pig-scripts/pig_1403224834638.log

Works fine when myudf.myfunc 1st parameter is hardcoded:
P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = 
FOREACH A GENERATE myudf.myfunc('Markov    chain Monte Carlo (MCMC) methods are 
an important algorithmic', line); STORE SCORES into '$outfile';");
                                          

Reply via email to