Unfortunately I've realised that boundscript.describe doesn't return a string. It returns void but prints to stdout. This means I have to go through a rather painful process of calling a separate python process that calls boundscript.describe and then capture the stdout of that process in order to obtain the schema. I don't know why it doesn't return a string. Maybe there is an easier way I am missing here. If people have any ideas for a more elegant solution I would be happy to contribute develop it and contribute the code.
Martin On 15 November 2012 20:20, Jonathan Coveney <[email protected]> wrote: > Martin, > > That is a reasonable workaround. Even in java UDF's, you can't directly > access fields by name. Tuples are indexed only by numbers. Using the Schema > is how I would do it. > > > 2012/11/14 Martin Goodson <[email protected]> > > > Sorry to reply to my question post but I've found a workaround that I > > thought I should put here: > > > > use embedded pig > > access the schema with boundscript.describe(). > > input the schema as a parameter into the udf call. > > > > Thanks > > Martin > > > > > > > > > > On 14 November 2012 16:17, Martin Goodson <[email protected]> > > wrote: > > > > > I normally deal with very large tuples with many fields. Its a pain to > > > deal with these in python udfs since I can't figure out a way to input > > > schemas into the udf. I have to hard code the column number in the > UDFs, > > > which is a maintenance nightmare. > > > > > > It seems that java UDFs receive the full tuple in their exec methods so > > > that the correct fields can be identified, whereas python UDFs only > > receive > > > lists objects (with field names stripped). Is there any way to get the > > > behaviour of python UDFs to conform to the java behaviour? > > > > > > > > > Thanks for any ideas > > > Martin > > > > > > > > >
