I did not write my own processor. I just re-use Tez Work created by Hive. So the processors are classes like HiveMap, HiveJoin defined by Hive.
So if I understand the setting correctly, only by modifying these processors can I take advantage of Shared Object Registry. Thanks a lot ! Raajay On Tue, Dec 1, 2015 at 3:39 PM, Bikas Saha <[email protected]> wrote: > To be clear, you have written your own processor that runs in your DAG > vertices? Your processor runs your custom code for processing input data. > > If yes, then the following applies. > > You will get access to the registry from your context object. > > You can use cacheForVertex() to cache for the lifetime of the vertex. > cacheForDAG() to cache for the lifetime of the DAG and cacheForSession() to > cache for the lifetime of a session (which runs multiple DAGs). As far as > the key, value parameters – key is any unique string to look up the value. > The value is any Java object (say a map or a list). For performance you > would want to cache the object in a form that can be immediately used > without any conversion. > > > > There is a toy example of the usage in the Tez source code in > BroadcastAndOneToOneExample.java > > > > The Javadoc for object registry would have more details. Please open a > jira if the Javadoc is not clear enough. > > > > *From:* Raajay [mailto:[email protected]] > *Sent:* Tuesday, December 1, 2015 11:02 AM > *To:* [email protected] > *Subject:* Re: Shared object registry > > > > I am running a custom application; however, the dag is created similar to > the dag that Hive would have created for the tpcds query. I use "TezClient" > to submit these dags. > > How can I use Shared Objects explicitly ? > > I understand that Object Registry provides a key value interface. But then > if I want to dump intermediate data (say output of mappers for small jobs) > into the shared object registry how shall I do that ? > > Raajay > > > > > > On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha <[email protected]> wrote: > > Object registry is a user enabled feature provided by Tez to the > application > (e.g. Hive and Pig) If the application chooses to use this, then it can do > some user land caching across tasks/vertices/dags using it. E.g. hive > caches > the smaller broadcast side of a broadcast join in the shared object > registry. > > Object registry is not an automatic data caching or input caching > mechanism. > > What application/job are you running? Hive/Pig/Custom? Unless the > application (like Hive) has used object caching for a cross dag scenario > (which AFAIK it does not) you will not see any difference. If its custom > then you will have to explicitly use object registry in a manner that makes > sense for your app. > > > > -----Original Message----- > From: Raajay [mailto:[email protected]] > Sent: Tuesday, December 1, 2015 10:36 AM > To: [email protected] > Subject: Shared object registry > > How to effectively use shared object registry? > > I created a tez client as a session, and submitted a dag twice > sequentially. > > > However, i did not see noticeable difference in their run times. They query > was tpcds query#3. > > I had set enable container reuse in tez-site.xml. Are there other configs i > need to ensure are set correctly to use shares objects? > > - Raajay > > >
