Charles,
If you are iterating through a relation, you don't need to refer to it
in the statement.

meaning:

C = FILTER B BY valid(B.url);

should be

C = FILTER B BY valid(url);

(you already have access to the rows, not to the relation B).

The error you are getting is from a new feature that allows you to
pretend that some relation is a scalar and use that scalar value
transparently when iterating over another relation eg:

total = foreach (group stuff all) generate COUNT($1) as cnt;
percent = foreach (group stuff by type) generate COUNT($1) / total.cnt

Here, I am using the "total" relation as a single-row relation,
essentially promising Pig that total.cnt is only a single value.
In your case you are doing that to a multi-row relation, and things blow up.

D

On Thu, Feb 10, 2011 at 5:42 PM, Charles Gonçalves <[email protected]> wrote:
> I'm trying just to do a breakdown for all my logs but every time I use  a
> operation like :
> FILTER alias BY some_udf(alias);
> I got a problem.
>
> First  I got : ERROR 0: Scalar has more than one row in the output. :
>
> cfgmc@phoebe:~/workspace-java/MscPigScripts/scripts (121) 23:11:16
> scripts:> pig -x local
> grunt> REGISTER
> /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> grunt>
> grunt> -- Functions Definitions
> grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> grunt> DEFINE valid msc.pig.IsValidUrl();
> grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> grunt> Describe raw
> raw: {ts: long,timeTaken: int,cIp: chararray,fSize: long,sIp:
> chararray,sPort: int,scStatus: chararray,scBytes: long,csMethod:
> chararray,url: chararray,rsDuration: int,rsBytes: int,referrer:
> chararray,ua: chararray,edgeId: chararray}
> grunt> B = FOREACH raw GENERATE cIp,url ;
> grunt> describe B;
> B: {cIp: chararray,url: chararray}
> grunt> *C = FILTER B BY valid(B.url);*
> grunt> describe C;
> C: {cIp: chararray,url: chararray}
> grunt> D = GROUP C BY B.cIp;
> grunt> describe D;
> D: {group: chararray,C: {cIp: chararray,url: chararray}}
> grunt> urls_ok = FOREACH D GENERATE COUNT(C.url);
> grunt> describe urls_ok;
> urls_ok: {long}
> grunt> dump urls_ok;
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has
> more than one row in the output. 1st : (187.113.41.93,
> http://webcast.sambatech.com.br/000482/account/8/3/ed92827f3e722bfbbabf89aa4adb0068/ER7_FA_3009_CARRASCONANYDIF_470kbps_2010-09-30.mp4),
> 2nd :(186.213.248.23,
> http://webcast.sambatech.com.br/000482/account/8/3/thumbnail/media/ea41d211f4e277821cb3e9fd392a51cf/R7_CH_TINAROMA_EMAILR7FAZENDA_470kbps_2010-09-140.03426408348605037.jpg
> )
>  at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:89)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:325)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:169)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:289)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:127)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> Then I got :
>
> grunt> REGISTER
> /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> grunt> DEFINE valid msc.pig.IsValidUrl();
> grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> grunt> B = FOREACH raw GENERATE cIp, sIp, sPort, scStatus, csMethod,
> scBytes, url ;
> grunt> describe B;
> B: {cIp: chararray,sIp: chararray,sPort: int,scStatus: chararray,csMethod:
> chararray,scBytes: long,url: chararray}
> grunt> E = GROUP B ALL ;
> grunt> describe E;
> E: {group: chararray,B: {cIp: chararray,sIp: chararray,sPort: int,scStatus:
> chararray,csMethod: chararray,scBytes: long,url: chararray}}
>
> grunt> edge_breakdown = FOREACH E {
>>> dist_cIps = DISTINCT B.cIp;
>>> dist_sIps = DISTINCT B.sIp;
>>> *urls_ok = FILTER B BY valid(B.url);*
>>> GENERATE COUNT(dist_cIps),COUNT(dist_sIps) ,COUNT(urls_ok.url),
> COUNT(B.url), SUM(B.scBytes);
>>> }
> grunt> DESC
>
> DESC       DESCRIBE
> grunt> DESCRIBE edge_breakdown;
> 2011-02-10 23:36:35,274 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2011-02-10 23:36:35,301 [main] ERROR org.apache.pig.impl.plan.OperatorPlan -
> Attempt to connect operator urls_ok: Filter 1-196 which is not in the plan.
> 2011-02-10 23:36:35,302 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2219: Unable to process scalar in the plan
> Details at logfile:
> /home/speed/cfgmc/workspace-java/MscPigScripts/scripts/pig_1297388063472.log
> grunt>
>
> The log file  says:
>
> Pig Stack Trace
> ---------------
> ERROR 2219: Unable to process scalar in the plan
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to
> describe schema for alias edge_breakdown
>  at org.apache.pig.PigServer.dumpSchema(PigServer.java:653)
> at
> org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:236)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:315)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>  at org.apache.pig.Main.run(Main.java:465)
> at org.apache.pig.Main.main(Main.java:107)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2219:
> Unable to process scalar in the plan
> at org.apache.pig.PigServer.mergeScalars(PigServer.java:1299)
>  at org.apache.pig.PigServer.compileLp(PigServer.java:1304)
> at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
>  at org.apache.pig.PigServer.dumpSchema(PigServer.java:639)
> ... 7 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to
> connect operator urls_ok: Filter 1-196 which is not in the plan.
>  at org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:409)
> at
> org.apache.pig.impl.plan.OperatorPlan.createSoftLink(OperatorPlan.java:210)
>  at org.apache.pig.PigServer.mergeScalars(PigServer.java:1294)
> ... 10 more
> ================================================================================
>
> If I run the last  script without the Filter inside the inner foreach it
> works perfecty. The udf is used perfectly in other contexts and works fine.
>
>
>
> Guys, seriously, what I'm missing here?
> I got stuck all day on this issue!
>
>
> --
> *Charles Ferreira Gonçalves *
> http://homepages.dcc.ufmg.br/~charles/
> UFMG - ICEx - Dcc
> Cel.: 55 31 87741485
> Tel.:  55 31 34741485
> Lab.: 55 31 34095840
>

Reply via email to