Hi,

I have some code that looks like this:

top_hits = foreach regrouped {
    result = TOP(1, 6, projected_joined_albums); -- field 6 = score
    generate flatten(result);
};

I'm not too keen on the TOP syntax because it's opaque and you need
the comment there to explain what's going on.

I've seen the same thing achieved like so, in a more transparent way,
and in fact I've used this in other cases myself:

top_hits = foreach regrouped {
    sorted = order projected_joined_albums by score desc;
    result = limit sorted 1;
    generate flatten(result);
};

However, although the first form works for me, the second dies with
the following error:

java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.Tuple
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:291)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:433)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
(etc.)

Is there a reason for why it would fail in this case? I can't
understand the meaning of the error, it'd be nice if it reported
*which* Tuple was failing a cast.

regrouped has the following schema:

{group: (artistid: int,country: int,week:
chararray),projected_joined_albums:
{joined_albums_2::joined_albums_1::flattened_albums::key: (artistid:
int,country: int,week:
chararray),joined_albums_2::joined_albums_1::flattened_albums::timestamp:
long,joined_albums_2::joined_albums_1::flattened_albums::albumid:
int,track_counts::numtracks: long,joined_albums_2::reach::reach:
int,joined_albums_2::joined_albums_1::album_titles::title_len:
long,score: long}}

That's a bit complex so I extracted the individual fields with a
foreach .. generate beforehand:

{group: (artistid: int,country: int,week:
chararray),projected_joined_albums: {key: (artistid: int,country:
int,week: chararray),timestamp: long,albumid: int,numtracks:
long,reach: int,title_len: long,score: long}}

It didn't affect the error, though.

Thanks for any suggestions,

Andrew.

-- 

http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Reply via email to