Hi Markus, That's correct -- the loaders will get instantiated both on the client-side (to allow you to do any setup you need to do), and on the MR side (to actually do the loading). You can do a couple of things to get your properties over to the MR side:
1) add your file to the "tmpfiles" property of the jobconf that gets passed in via setLocation. This may be error-prone since you might be in a situation where two of your loaders, with different properties, are processed in the same MR job (for a join, for example). 2) Serialize your properties straight into the udf context, namespaced using the signature you get via setUDFContextSignature, and deserialize them on the backend. D On Mon, Apr 23, 2012 at 7:09 AM, Markus Resch <[email protected]> wrote: > Hey Folks, > > We've created our own LOAD function by extending the default AVRO > Storage (basicly we're processing a set of paths to glob by the Avro > Storage) > > Our algorithm needs some basic configuration which we're reading out of > a .properties file which is located right beside the pig script. > Our algorithm works great. According to the output directly after > starting the pig script everything is just fine. But after the jobs runs > for a while we're getting an error message which says it can't find the > properties file. We're assuming that the load gets started on each data > node and we don't have that config there. Is that assumption true? And > if: Is there a way to work around this issue? > > Thanks > > Markus >
