So it seems like this problem is isolated in the REGEX function.

If I load the data using

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
(name:CHARARRAY,host:CHARARRAY,count:DOUBLE);

everything is fine.

If I load it

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
(name:CHARARRAY,host:CHARARRAY,count:CHARARRAY);
regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count,
'(\\d+)') AS regex;

it fails

On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <[email protected]>wrote:

> Hi
> I am experiencing the following issues in part of my pig script.
>
> *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
> '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS regex;
> eventCountData = FILTER data BY regex is not null;
> eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
> (name:CHARARRAY, count:DOUBLE);
> *
> If I describe the data I get.
> eventCountData: {host: chararray,name: chararray,count: double}
>
> I then perform
>
> *eventNameGroupPerHost = GROUP eventCountData BY (name,host);
> *
> and I get
> eventNameGroupPerHost: {group: (name: chararray,host:
> chararray),eventCountData: {host: chararray,name: chararray,count: double}}
>
>
> *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
>     group.host,
>     group.name,
>     SUM(eventCountData.count);
> *
>
> and I get
>
> *org.apache.pig.backend.executionengine.ExecException: ERROR 2103:
> Problem while computing sum of doubles.
> at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:310)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:357)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:131)
> *
>
> we then tried running it through our own UDF to see what was happening and
> we found.
>
> java.lang.String cannot be cast to java.lang.Number, tuple:(0), field:0
>
> It seems as though pig thinks zero is a string. Is this normal or a bug?
>
> Thanks
>



-- 
Thanks

Cameron Gandevia

Reply via email to