I am running a custom application; however, the dag is created similar to the dag that Hive would have created for the tpcds query. I use "TezClient" to submit these dags.
How can I use Shared Objects explicitly ? I understand that Object Registry provides a key value interface. But then if I want to dump intermediate data (say output of mappers for small jobs) into the shared object registry how shall I do that ? Raajay On Tue, Dec 1, 2015 at 12:47 PM, Bikas Saha <[email protected]> wrote: > Object registry is a user enabled feature provided by Tez to the > application > (e.g. Hive and Pig) If the application chooses to use this, then it can do > some user land caching across tasks/vertices/dags using it. E.g. hive > caches > the smaller broadcast side of a broadcast join in the shared object > registry. > > Object registry is not an automatic data caching or input caching > mechanism. > > What application/job are you running? Hive/Pig/Custom? Unless the > application (like Hive) has used object caching for a cross dag scenario > (which AFAIK it does not) you will not see any difference. If its custom > then you will have to explicitly use object registry in a manner that makes > sense for your app. > > > -----Original Message----- > From: Raajay [mailto:[email protected]] > Sent: Tuesday, December 1, 2015 10:36 AM > To: [email protected] > Subject: Shared object registry > > How to effectively use shared object registry? > > I created a tez client as a session, and submitted a dag twice > sequentially. > > > However, i did not see noticeable difference in their run times. They query > was tpcds query#3. > > I had set enable container reuse in tez-site.xml. Are there other configs i > need to ensure are set correctly to use shares objects? > > - Raajay > > >
