I am running a custom application; however, the dag is created similar to
the dag that Hive would have created for the tpcds query. I use "TezClient"
to submit these dags.

How can I use Shared Objects explicitly ?

I understand that Object Registry provides a key value interface. But then
if I want to dump intermediate data (say output of mappers for small jobs)
into the shared object registry how shall I do that ?

Raajay


On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha <[email protected]> wrote:

> Object registry is a user enabled feature provided by Tez to the
> application
> (e.g. Hive and Pig) If the application chooses to use this, then it can do
> some user land caching across tasks/vertices/dags using it. E.g. hive
> caches
> the smaller broadcast side of a broadcast join in the shared object
> registry.
>
> Object registry is not an automatic data caching or input caching
> mechanism.
>
> What application/job are you running? Hive/Pig/Custom? Unless the
> application (like Hive) has used object caching for a cross dag scenario
> (which AFAIK it does not) you will not see any difference. If its custom
> then you will have to explicitly use object registry in a manner that makes
> sense for your app.
>
>
> -----Original Message-----
> From: Raajay [mailto:[email protected]]
> Sent: Tuesday, December 1, 2015 10:36 AM
> To: [email protected]
> Subject: Shared object registry
>
> How to effectively use shared object registry?
>
> I created a tez client as a session, and submitted a dag twice
> sequentially.
>
>
> However, i did not see noticeable difference in their run times. They query
> was tpcds query#3.
>
> I had set enable container reuse in tez-site.xml. Are there other configs i
> need to ensure are set correctly to use shares objects?
>
> - Raajay
>
>
>

Reply via email to