Note that:
FOREACH data GENERATE host, FLATTEN(regex) AS (name:CHARARRAY,
count:DOUBLE);
does not do convert count to DOUBLE automatically, you need to do the cast
by yourself. What "describe" tells you is a lie. It is a known bug, see
https://issues.apache.org/jira/browse/PIG-2315

Daniel

On Thu, Nov 3, 2011 at 11:55 AM, Andrea Leistra
<[email protected]>wrote:

> REGEX_EXTRACT_ALL returns a chararray.  If what you're getting back is a
> numerical value you will need to cast it as such before you can do math
> with it.
>
> -----Original Message-----
> From: Cameron Gandevia [mailto:[email protected]]
> Sent: Thursday, November 03, 2011 2:51 PM
> To: [email protected]
> Subject: Re: Exception in SUM command
>
> So it seems like this problem is isolated in the REGEX function.
>
> If I load the data using
>
> raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
> (name:CHARARRAY,host:CHARARRAY,count:DOUBLE);
>
> everything is fine.
>
> If I load it
>
> raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
> (name:CHARARRAY,host:CHARARRAY,count:CHARARRAY);
> regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count,
> '(\\d+)') AS regex;
>
> it fails
>
> On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <[email protected]
> >wrote:
>
> > Hi
> > I am experiencing the following issues in part of my pig script.
> >
> > *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
> > '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS
> > regex; eventCountData = FILTER data BY regex is not null;
> > eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
> > (name:CHARARRAY, count:DOUBLE);
> > *
> > If I describe the data I get.
> > eventCountData: {host: chararray,name: chararray,count: double}
> >
> > I then perform
> >
> > *eventNameGroupPerHost = GROUP eventCountData BY (name,host);
> > *
> > and I get
> > eventNameGroupPerHost: {group: (name: chararray,host:
> > chararray),eventCountData: {host: chararray,name: chararray,count:
> > double}}
> >
> >
> > *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
> >     group.host,
> >     group.name,
> >     SUM(eventCountData.count);
> > *
> >
> > and I get
> >
> > *org.apache.pig.backend.executionengine.ExecException: ERROR 2103:
> > Problem while computing sum of doubles.
> > at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
> > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
> > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> > Operators.POUserFunc.getNext(POUserFunc.java:245)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> > Operators.POUserFunc.getNext(POUserFunc.java:310)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POForEach.processPlan(POForEach.java:357)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POForEach.getNext(POForEach.java:290)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
> > erator.processInput(PhysicalOperator.java:276)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POStore.getNext(POStore.java:131)
> > *
> >
> > we then tried running it through our own UDF to see what was happening
> > and we found.
> >
> > java.lang.String cannot be cast to java.lang.Number, tuple:(0),
> > field:0
> >
> > It seems as though pig thinks zero is a string. Is this normal or a bug?
> >
> > Thanks
> >
>
>
>
> --
> Thanks
>
> Cameron Gandevia
>
> This e-mail message is authorized for use by the intended recipient only
> and may contain information that is privileged and confidential. If you
> received this message in error, please call us immediately at (425)
> 702-8808 and ask to speak to the message sender. Please do not copy,
> disseminate, or retain this message unless you are the intended recipient.
> In addition, to ensure the security of your data, please do not send any
> unencrypted credit card or personally identifiable information to this
> email address. Thank you.
>

Reply via email to