Hi everyone,
I want to read each line in file test.txt -- it will be something like "'Markov
chain Monte Carlo (MCMC) methods are an important algorithmic'" -- just
individual sentences and then I will do some text processing on each sentence
in my udf. I get back a score for each of these sentences. I am including only
the relevant snippet of my python script. Any help with this will be great!
// Script snippet:params ={'infile': '/data/data.in', 'outfile':
'/results/scores/', 'sentence': '0' };
f=open('./test.txt')
for currentline in f: print 'Current line is' + currentline;
params["sentence"] = currentline; params["outfile"] = '/results/scores/' +
'algo.out' + str(i); bound = P.bind(params); i = i + 1; if
result.isSuccessful() : print 'Pig job succeeded' else : raise
'Pig job failed'
Does not work when the myudf.myfunc is parameterized -- $sentence.P =
Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A
GENERATE myudf.myfunc($sentence, line); STORE SCORES into '$outfile';")\
I get the following error:Current line is Markov chain Monte Carlo (MCMC)
methods are an important algorithmic
2014-06-19 17:40:50,371 [main] INFO org.apache.pig.scripting.BoundScript -
Query to run:A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A
GENERATE myudf.myfunc(Markov, line); STORE SCORES into
'/results/scores/algo.out0';
2014-06-19 17:40:51,100 [main] ERROR org.apache.pig.Main - ERROR 1025:<line 1,
column 110> Invalid field projection. Projected field [Markov] does not exist
in schema: line:chararray.Details at logfile:
/apps/software/pig-scripts/pig_1403224834638.log
Works fine when myudf.myfunc 1st parameter is hardcoded:
P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES =
FOREACH A GENERATE myudf.myfunc('Markov chain Monte Carlo (MCMC) methods are
an important algorithmic', line); STORE SCORES into '$outfile';");