Oh thanks for the reply, Jason. That was my suspicion too.

The UDF in our case is not a function per say in pure mathematical sense of
the word 'function'. That is because, it doesn't take in a value and give
out another value. It has side effects, that form input for another
MapReduce job. The point of doing it this way is that we wanted to make use
of the parallelism that would be afforded by running it as a map reduce job
via hive, as the processing is fairly compute extensive.

Is there a way to force map-reduce jobs? I think hive.fetch.task.conversion
to minimal might help, is there anything that can be done?

Thanks a ton.

On Tue, Aug 25, 2015 at 2:36 PM, Jason Dere <jd...@hortonworks.com> wrote:

> ​There might be a few cases where a UDF is executed locally and not as
> part of a Map/Reduce job​:
>
>  - Hive might choose not to run a M/R task for your query (see
> hive.fetch.task.conversion)
>
>  - If the UDF is deterministic and has deterministic inputs, Hive might
> decide to run the UDF once to get the value and use constant folding to
> replace calls of that UDF with the value from the one UDF call (see
> *hive.optimize.constant.propagation​)*
>
>
> Taking a look at the explain plan for you query might confirm this. In
> those cases the UDF would not run within a M/R task and configure() would
> not be called.
>
>
>
> ------------------------------
> *From:* Rahul Sharma <kippy....@gmail.com>
> *Sent:* Tuesday, August 25, 2015 11:32 AM
> *To:* user@hive.apache.org
> *Subject:* UDF Configure method not getting called
>
> Hi Guys,
>
> We have a UDF which extends GenericUDF and does some configuration within
> the public void configure(MapredContext ctx) method.
>
> MapredContext in configure method gives access to the HiveConfiguration
> via JobConf, which contains custom attributes of the form xy.abc.something.
> Reading these values is required for the semantics of the UDF.
>
> Everything works fine till Hive 0.13, however with Hive 0.14 (or 1.0) the
> configure method of the UDF is never called by the runtime and hence the
> UDF cannot configure itself dynamically.
>
> Is this the intended behavior? If so, what is the new way to read
> configuration of the Map Reduce Job within the UDF?
>
> I would be grateful for any help.
>

Reply via email to