I think I might have found a way to transform it directly into a bag. Inside the HBaseStorage() Load Function I have set the HBase scan batch to 1, so I got for every scan.next() one column instead of all columns. See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
setBatch(int batch) Set the maximum number of values to return for each call to next() I think this will work. Any idea if this way have disadvantages? regards 2013/9/13 John <[email protected]> > hi, > > the join key is in the bag, thats the problem. The Load Function returns > only one element 0$ and that is the map. This map is transformed in the > next step with the UDF "MapToBagUDF" into a bag. for example the load > functions returns this ([col1,col2,col3), then this map inside the tuple is > transformed to: > > (col1) > (col2) > (col3) > > Maybe there is is way to transform the map directly in the load function > into a bag? The problem I see is that the next() Method in the LoadFunc has > to be a Tuple and no Bag. :/ > > > > 2013/9/13 Pradeep Gollakota <[email protected]> > >> Since your join key is not in the Bag, can you do your join first and then >> execute your UDF? >> >> >> On Fri, Sep 13, 2013 at 10:04 AM, John <[email protected]> >> wrote: >> >> > Okay, I think I have found the problem here: >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is >> > wirtten; >> > >> > There may be filter statements and foreach statements between the sorted >> > data source and the join statement. The foreach statement should meet >> the >> > following conditions: >> > >> > - There should be no UDFs in the foreach statement. >> > - The foreach statement should not change the position of the join >> keys. >> > - There should be no transformation on the join keys which will >> change >> > the sort order. >> > >> > >> > I have to use a UDF to transform the Map into a Bag ... any Workaround >> > idea? >> > >> > thanks >> > >> > >> > 2013/9/13 John <[email protected]> >> > >> > > Hi, >> > > >> > > I try to use a merge join for 2 bags. Here is my pig code: >> > > http://pastebin.com/Y9b2UtNk . >> > > >> > > But I got this error: >> > > >> > > Caused by: >> > > >> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, >> Ascending >> > > Sort, or Load as its predecessors. Found >> > > >> > > I think the reason is that there is no sort function or something like >> > > this. But the bags are definitely sorted. How can I do the merge join? >> > > >> > > thanks >> > > >> > >> > >
