REGEX_EXTRACT_ALL returns a chararray. If what you're getting back is a numerical value you will need to cast it as such before you can do math with it.
-----Original Message----- From: Cameron Gandevia [mailto:[email protected]] Sent: Thursday, November 03, 2011 2:51 PM To: [email protected] Subject: Re: Exception in SUM command So it seems like this problem is isolated in the REGEX function. If I load the data using raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS (name:CHARARRAY,host:CHARARRAY,count:DOUBLE); everything is fine. If I load it raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS (name:CHARARRAY,host:CHARARRAY,count:CHARARRAY); regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count, '(\\d+)') AS regex; it fails On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <[email protected]>wrote: > Hi > I am experiencing the following issues in part of my pig script. > > *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body, > '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS > regex; eventCountData = FILTER data BY regex is not null; > eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS > (name:CHARARRAY, count:DOUBLE); > * > If I describe the data I get. > eventCountData: {host: chararray,name: chararray,count: double} > > I then perform > > *eventNameGroupPerHost = GROUP eventCountData BY (name,host); > * > and I get > eventNameGroupPerHost: {group: (name: chararray,host: > chararray),eventCountData: {host: chararray,name: chararray,count: > double}} > > > *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE > group.host, > group.name, > SUM(eventCountData.count); > * > > and I get > > *org.apache.pig.backend.executionengine.ExecException: ERROR 2103: > Problem while computing sum of doubles. > at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147) > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46) > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression > Operators.POUserFunc.getNext(POUserFunc.java:245) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression > Operators.POUserFunc.getNext(POUserFunc.java:310) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > Operators.POForEach.processPlan(POForEach.java:357) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > Operators.POForEach.getNext(POForEach.java:290) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp > erator.processInput(PhysicalOperator.java:276) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational > Operators.POStore.getNext(POStore.java:131) > * > > we then tried running it through our own UDF to see what was happening > and we found. > > java.lang.String cannot be cast to java.lang.Number, tuple:(0), > field:0 > > It seems as though pig thinks zero is a string. Is this normal or a bug? > > Thanks > -- Thanks Cameron Gandevia This e-mail message is authorized for use by the intended recipient only and may contain information that is privileged and confidential. If you received this message in error, please call us immediately at (425) 702-8808 and ask to speak to the message sender. Please do not copy, disseminate, or retain this message unless you are the intended recipient. In addition, to ensure the security of your data, please do not send any unencrypted credit card or personally identifiable information to this email address. Thank you.
