Skanda <skanda.ganapathy@...> writes: > > Hi All, > > I have a use case to get the get N distinct url's based on the number of > hits and their latest timestamp. Pls find below the snippet of the pig > script that I have written to do this. > > prunedUrlData = FOREACH urlPatternData GENERATE (url_pattern is > null?url:url_pattern) AS > url,domid,urlkey,urllen,puid,nwid,lmd,rc,punam,nwnam,ispub,com.xxx.GetD omainStorageLimit(nwid) > AS *domainlimit*; > > group_by_Domain_Url = GROUP prunedUrlData BY domid; > > rankedUrlByDomain = FOREACH group_by_Domain_Url > { > distinct_url = DISTINCT prunedUrlData; > url_rank_dom = ORDER distinct_url BY lmd DESC,rc DESC; > url_domain_limit = LIMIT url_rank_dom *domainlimit*; > GENERATE FLATTEN(url_domain_limit); > }; > > The only problem that I have now is the domainlimit variable that I'm > passing to the LIMIT statement <at> runtime. I'm getting the following > exception : > > java.lang.RuntimeException: Unable to evaluate Limit expression: NULL > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO perators.POLimit.getNext(POLimit.java:97) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp erator.getNext(PhysicalOperator.java:432) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression Operators.POProject.processInputBag(POProject.java:583) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression Operators.PORelationToExprProject.getNext(PORelationToExprProject.java :107) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp erator.getNext(PhysicalOperator.java:334) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO perators.POForEach.processPlan(POForEach.java:372) > at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO perators.POForEach.getNext(POForEach.java:297) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen ericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen ericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduc e.java:433) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen ericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen ericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java :610) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma tion.java:1438) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > If I use a constant for the LIMIT, it works fine. I printed the > "group_by_Domain_Url" to see if i'm getting the domainlimit, and I'm > able to see a value. > > But when i apply it to LIMIT, it says "Unable to evaluate Limit > expression: NULL". Where am I going wrong? > > Regards, > Skanda >
I'm having the exact same issue. Have you been able to solve it? I'm using 0.11. Please help. Thanks.