Hi Richard, I sincerely regret the inconvenience caused.....
%YAML 1.1 --- VERSION: 1.0.0.1 DATABASE: test_db1 USER: gpadmin DEFINE: - INPUT: #****** This the line which is causing the error ******# NAME: doc TABLE: documents - INPUT: NAME: kw TABLE: keywords - MAP: NAME: doc_map LANGUAGE: python FUNCTION: | i = 0 terms = {} for term in data.lower().split(): i = i + 1 if term in terms: terms[term] += ','+str(i) else: terms[term] = str(i) for term in terms: yield([doc_id, term, terms[term]]) OPTIMIZE: STRICT IMMUTABLE PARAMETERS: - doc_id integer - data text RETURNS: - doc_id integer - term text - positions text - MAP: NAME: kw_map LANGUAGE: python FUNCTION: | i = 0 terms = {} for term in keyword.lower().split(): i = i + 1 if term in terms: terms[term] += ','+str(i) else: terms[term] = str(i) yield([keyword_id, i, term, terms[term]]) OPTIMIZE: STRICT IMMUTABLE PARAMETERS: - keyword_id integer - keyword text RETURNS: - keyword_id integer - nterms integer - term text - positions text - TASK: NAME: doc_prep SOURCE: doc MAP: doc_map - TASK: NAME: kw_prep SOURCE: kw MAP: kw_map - INPUT: NAME: term_join QUERY: | SELECT doc.doc_id, kw.keyword_id, kw.term, kw.nterms, doc.positions as doc_positions, kw.positions as kw_positions FROM doc_prep doc INNER JOIN kw_prep kw ON (doc.term = kw.term) - REDUCE: NAME: term_reducer TRANSITION: term_transition FINALIZE: term_finalizer - TRANSITION: NAME: term_transition LANGUAGE: python PARAMETERS: - state text - term text - nterms integer - doc_positions text - kw_positions text FUNCTION: | if state: kw_split = state.split(':') else: kw_split = [] for i in range(0,nterms): kw_split.append('') for kw_p in kw_positions.split(','): kw_split[int(kw_p)-1] = doc_positions outstate = kw_split[0] for s in kw_split[1:]: outstate = outstate + ':' + s return outstate - FINALIZE: NAME: term_finalizer LANGUAGE: python RETURNS: - count integer MODE: MULTI FUNCTION: | if not state: return 0 kw_split = state.split(':') previous = None for i in range(0,len(kw_split)): isplit = kw_split[i].split(',') if any(map(lambda(x): x == '', isplit)): return 0 adjusted = set(map(lambda(x): int(x)-i, isplit)) if (previous): previous = adjusted.intersection(previous) else: previous = adjusted if previous: return len(previous) return 0 - TASK: NAME: term_match SOURCE: term_join REDUCE: term_reducer - INPUT: NAME: final_output QUERY: | SELECT doc.*, kw.*, tm.count FROM documents doc, keywords kw, term_match tm WHERE doc.doc_id = tm.doc_id AND kw.keyword_id = tm.keyword_id AND tm.count > 0 EXECUTE: - RUN: SOURCE: final_output TARGET: STDOUT I have learnt that unnecessary TABs can the cause of this, so trying to overcome that, hopefully the problem will subside then.... Regards, Suvankar Roy Richard Huxton <d...@archonet.com> 08/03/2009 02:55 PM To Suvankar Roy <suvankar....@tcs.com> cc pgsql-performance@postgresql.org Subject Re: [PERFORM] Greenplum MapReduce Suvankar Roy wrote: > Hi all, > > Has anybody worked on Greenplum MapReduce programming ? > > I am facing a problem while trying to execute the below Greenplum > Mapreduce program written in YAML (in blue). The other poster suggested contacting Greenplum and I can only agree. > The error is thrown in the 7th line as: > Error: YAML syntax error - found character that cannot start any token > while scanning for the next token, at line 7 (in red) There is no red, particularly if viewing messages as plain text (which most people do on mailing lists). Consider indicating a line some other way next time (commonly below the line you put something like "this is line 7 ^^^^^") The most common problem I get with YAML files though is when a tab is accidentally inserted instead of spaces at the start of a line. -- Richard Huxton Archonet Ltd ForwardSourceID:NT000058E2 =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you