Note that: FOREACH data GENERATE host, FLATTEN(regex) AS (name:CHARARRAY, count:DOUBLE); does not do convert count to DOUBLE automatically, you need to do the cast by yourself. What "describe" tells you is a lie. It is a known bug, see https://issues.apache.org/jira/browse/PIG-2315
Daniel On Thu, Nov 3, 2011 at 11:55 AM, Andrea Leistra <[email protected]>wrote: > REGEX_EXTRACT_ALL returns a chararray. If what you're getting back is a > numerical value you will need to cast it as such before you can do math > with it. > > -----Original Message----- > From: Cameron Gandevia [mailto:[email protected]] > Sent: Thursday, November 03, 2011 2:51 PM > To: [email protected] > Subject: Re: Exception in SUM command > > So it seems like this problem is isolated in the REGEX function. > > If I load the data using > > raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS > (name:CHARARRAY,host:CHARARRAY,count:DOUBLE); > > everything is fine. > > If I load it > > raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS > (name:CHARARRAY,host:CHARARRAY,count:CHARARRAY); > regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count, > '(\\d+)') AS regex; > > it fails > > On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <[email protected] > >wrote: > > > Hi > > I am experiencing the following issues in part of my pig script. > > > > *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body, > > '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS > > regex; eventCountData = FILTER data BY regex is not null; > > eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS > > (name:CHARARRAY, count:DOUBLE); > > * > > If I describe the data I get. > > eventCountData: {host: chararray,name: chararray,count: double} > > > > I then perform > > > > *eventNameGroupPerHost = GROUP eventCountData BY (name,host); > > * > > and I get > > eventNameGroupPerHost: {group: (name: chararray,host: > > chararray),eventCountData: {host: chararray,name: chararray,count: > > double}} > > > > > > *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE > > group.host, > > group.name, > > SUM(eventCountData.count); > > * > > > > and I get > > > > *org.apache.pig.backend.executionengine.ExecException: ERROR 2103: > > Problem while computing sum of doubles. > > at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147) > > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46) > > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression > > Operators.POUserFunc.getNext(POUserFunc.java:245) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression > > Operators.POUserFunc.getNext(POUserFunc.java:310) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > > Operators.POForEach.processPlan(POForEach.java:357) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > > Operators.POForEach.getNext(POForEach.java:290) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp > > erator.processInput(PhysicalOperator.java:276) > > at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > > Operators.POStore.getNext(POStore.java:131) > > * > > > > we then tried running it through our own UDF to see what was happening > > and we found. > > > > java.lang.String cannot be cast to java.lang.Number, tuple:(0), > > field:0 > > > > It seems as though pig thinks zero is a string. Is this normal or a bug? > > > > Thanks > > > > > > -- > Thanks > > Cameron Gandevia > > This e-mail message is authorized for use by the intended recipient only > and may contain information that is privileged and confidential. If you > received this message in error, please call us immediately at (425) > 702-8808 and ask to speak to the message sender. Please do not copy, > disseminate, or retain this message unless you are the intended recipient. > In addition, to ensure the security of your data, please do not send any > unencrypted credit card or personally identifiable information to this > email address. Thank you. >
