Wouldn't this slow down your data retrieval? Once column in each call instead of a batch?
Regards, Shahab On Fri, Sep 13, 2013 at 2:34 PM, John <[email protected]> wrote: > I think I might have found a way to transform it directly into a bag. > Inside the HBaseStorage() Load Function I have set the HBase scan batch to > 1, so I got for every scan.next() one column instead of all columns. See > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html > > setBatch(int batch) > Set the maximum number of values to return for each call to next() > > I think this will work. Any idea if this way have disadvantages? > > regards > > > 2013/9/13 John <[email protected]> > > > hi, > > > > the join key is in the bag, thats the problem. The Load Function returns > > only one element 0$ and that is the map. This map is transformed in the > > next step with the UDF "MapToBagUDF" into a bag. for example the load > > functions returns this ([col1,col2,col3), then this map inside the tuple > is > > transformed to: > > > > (col1) > > (col2) > > (col3) > > > > Maybe there is is way to transform the map directly in the load function > > into a bag? The problem I see is that the next() Method in the LoadFunc > has > > to be a Tuple and no Bag. :/ > > > > > > > > 2013/9/13 Pradeep Gollakota <[email protected]> > > > >> Since your join key is not in the Bag, can you do your join first and > then > >> execute your UDF? > >> > >> > >> On Fri, Sep 13, 2013 at 10:04 AM, John <[email protected]> > >> wrote: > >> > >> > Okay, I think I have found the problem here: > >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is > >> > wirtten; > >> > > >> > There may be filter statements and foreach statements between the > sorted > >> > data source and the join statement. The foreach statement should meet > >> the > >> > following conditions: > >> > > >> > - There should be no UDFs in the foreach statement. > >> > - The foreach statement should not change the position of the join > >> keys. > >> > - There should be no transformation on the join keys which will > >> change > >> > the sort order. > >> > > >> > > >> > I have to use a UDF to transform the Map into a Bag ... any Workaround > >> > idea? > >> > > >> > thanks > >> > > >> > > >> > 2013/9/13 John <[email protected]> > >> > > >> > > Hi, > >> > > > >> > > I try to use a merge join for 2 bags. Here is my pig code: > >> > > http://pastebin.com/Y9b2UtNk . > >> > > > >> > > But I got this error: > >> > > > >> > > Caused by: > >> > > > >> > > >> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: > >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, > >> Ascending > >> > > Sort, or Load as its predecessors. Found > >> > > > >> > > I think the reason is that there is no sort function or something > like > >> > > this. But the bags are definitely sorted. How can I do the merge > join? > >> > > > >> > > thanks > >> > > > >> > > >> > > > > >
