Re: tez "sessions"
yeah. looks related to the timeline-service alright - i think. here's the jstack output. 2017-03-14 11:27:46 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode): "Attach Listener" #536 daemon prio=9 os_prio=0 tid=0x7f46c00da000 nid=0x2be6 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "AMShutdownThread" #517 daemon prio=5 os_prio=0 tid=0x7f46b8059000 nid=0x6cfa runnable [0x7f46a73af000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0xfe71a6e0> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:703) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534) - locked <0xfe70bd68> (a sun.net.www.protocol.http.HttpURLConnection) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439) - locked <0xfe70bd68> (a sun.net.www.protocol.http.HttpURLConnection) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.serviceStop(ATSHistoryLoggingService.java:233) - locked <0xcd56b968> (a java.lang.Object) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <0xcd4042a0> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.tez.dag.history.HistoryEventHandler.serviceStop(HistoryEventHandler.java:85) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <0xcd0cf878> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65) at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1938) at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2121) - locked <0xccf30038> (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <0xccf301d0> (a java.lang.Object) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:952) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "ContainerLauncher #31" #210 daemon prio=5 os_prio=0 tid=0x7f46c42c nid=0x1601 waiting on condition [0x7f46a4f9]
Re: tez "sessions"
Thanks Gopal. lemme see what i can do with your insights and report back with my findings. Cheers, Stephen. On Tue, Mar 14, 2017 at 6:19 AM, Gopal Vijayaraghavanwrote: > > Looking at the doc i thought this config setting would influence those > Tez jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but > testing proved otherwise. It didn't seem to have any affect. > > So i ask. How to force off those Tez jobs organically? Or is there > perhaps something else i'm missing? > > A jstack of the Tez AM would be useful. > > My guess is that this is related to ATS. > > tez.yarn.ats.event.flush.timeout.millis=-1L; > > Is the default and if ATS is down for whatever reason, Tez queries will > wait infinite time to flush all events to ATS. > > You can probably set that to 60L and see if the AMs disappear after 10 > minutes. > > Before TEZ-1701, this was set to 3 seconds which broke the UI when the ATS > instance was temporarily unavaible. > > Cheers, > Gopal > > >
Re: tez "sessions"
> Looking at the doc i thought this config setting would influence those Tez > jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but testing > proved otherwise. It didn't seem to have any affect. > So i ask. How to force off those Tez jobs organically? Or is there perhaps > something else i'm missing? A jstack of the Tez AM would be useful. My guess is that this is related to ATS. tez.yarn.ats.event.flush.timeout.millis=-1L; Is the default and if ATS is down for whatever reason, Tez queries will wait infinite time to flush all events to ATS. You can probably set that to 60L and see if the AMs disappear after 10 minutes. Before TEZ-1701, this was set to 3 seconds which broke the UI when the ATS instance was temporarily unavaible. Cheers, Gopal
tez "sessions"
hi guys, this seems to be a familiar subject but i still don't have a handle on it, alas. I'm totally misunderstanding something here. in our case we use tez and submit many different jobs to our queue called "batch_sql". this is works great until the Tez job finishes (100% complete) and instead of dropping out of the queue it hangs around for hours it seems taking up a slot in our queue holding onto one container. as you can imagine given our queue width is 15 after 15 Tez jobs we're log jammed - so we wrote a script which looks for Tez jobs 100% complete and perform yarn kill commands on them. So, yeah, certainly not ideal. Looking at the doc i thought this config setting would influence those Tez jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but testing proved otherwise. It didn't seem to have any affect. So i ask. How to force off those Tez jobs organically? Or is there perhaps something else i'm missing? thanks, Stephen.
Re: Tez Sessions
HiveServer2 today maintains Tez sessions ( when running with perimeter security i.e Ranger/Sentry ) and re-uses the session across queries. Tez AM recovery works for the most part. It will try to recover completed tasks of the last running DAG and complete the one that did not complete or were still running. It does not handle cases where the committer was in the middle of a commit though so those dags will abort when trying to recover. Given the complexity of recovery, there are probably bugs that we may not have discovered yet but for the most part, it does function well. There are a few issues you should consider when trying to use a single AM: - on secure clusters, the delegation token max lifetime is 7 days so you will need to re-cycle apps on a weekly basis. - YARN does not clean up data/logs for an app until the app completes so this can add space pressure on the yarn local dirs. That said, there is some work happening as part of TEZ-3334 to help clean up intermediate data on a regular basis. There have been a couple of other jiras filed recently too to look at being able to clean up data more frequently. thanks — Hitesh > On Oct 20, 2016, at 2:35 PM, Madhusudan Ramanna <m.rama...@ymail.com> wrote: > > Ok, no worries. I agree that this single AM model would be very close to a > mini-job tracker. One of the options we're investigating having 1 yarn Tez > AM running all our DAGs. Given this AM already has all the > resources/containers, we were thinking this could save on the cost of AM, and > container initialization. > > We haven't looked into tez recovery as well. Durability is one of our big > concerns as well. > > > On Thursday, October 20, 2016 12:44 PM, Hitesh Shah <hit...@apache.org> wrote: > > > Not supported as of now. There are multiple aspects to supporting this > properly. One of the most important issues to address would be to do proper > QoS across various DAGs i.e. what kind of policies would need to be built out > to run multiple DAGs to completion within a limited amount of resources. The > model would become close to a mini-jobtracker or a spark-standalone cluster. > > Could you provide more details on what you are trying to achieve? We could > try and provide different viewpoints on trying to get you to a viable > solution. > > — Hitesh > > > On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> > > wrote: > > > > Hello Folks, > > > > http://hortonworks.com/blog/introducing-tez-sessions/ > > > > From the above post it seems like DAGs can only be executed serially. > > Could DAGs be executed in parallel on one Tez AM ? > > > > thanks, > > Madhu > >
Re: Tez Sessions
Ok, no worries. I agree that this single AM model would be very close to a mini-job tracker. One of the options we're investigating having 1 yarn Tez AM running all our DAGs. Given this AM already has all the resources/containers, we were thinking this could save on the cost of AM, and container initialization. We haven't looked into tez recovery as well. Durability is one of our big concerns as well. On Thursday, October 20, 2016 12:44 PM, Hitesh Shah <hit...@apache.org> wrote: Not supported as of now. There are multiple aspects to supporting this properly. One of the most important issues to address would be to do proper QoS across various DAGs i.e. what kind of policies would need to be built out to run multiple DAGs to completion within a limited amount of resources. The model would become close to a mini-jobtracker or a spark-standalone cluster. Could you provide more details on what you are trying to achieve? We could try and provide different viewpoints on trying to get you to a viable solution. — Hitesh > On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> wrote: > > Hello Folks, > > http://hortonworks.com/blog/introducing-tez-sessions/ > > From the above post it seems like DAGs can only be executed serially. Could > DAGs be executed in parallel on one Tez AM ? > > thanks, > Madhu
Re: Tez Sessions
Not supported as of now. There are multiple aspects to supporting this properly. One of the most important issues to address would be to do proper QoS across various DAGs i.e. what kind of policies would need to be built out to run multiple DAGs to completion within a limited amount of resources. The model would become close to a mini-jobtracker or a spark-standalone cluster. Could you provide more details on what you are trying to achieve? We could try and provide different viewpoints on trying to get you to a viable solution. — Hitesh > On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> wrote: > > Hello Folks, > > http://hortonworks.com/blog/introducing-tez-sessions/ > > From the above post it seems like DAGs can only be executed serially. Could > DAGs be executed in parallel on one Tez AM ? > > thanks, > Madhu
Tez Sessions
Hello Folks, http://hortonworks.com/blog/introducing-tez-sessions/ >From the above post it seems like DAGs can only be executed serially. Could >DAGs be executed in parallel on one Tez AM ? thanks,Madhu
Re: Enabling Tez sessions on HiveServer2
BCC’ed user@tez. This question belongs to either the hive user list or the Hortonworks user forums. thanks — Hitesh On Dec 2, 2014, at 1:28 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, I am trying to get Tez sessions enabled with HS2. I start the HiveServer2 instance with the flag -hiveconf hive.execution.engine=tez and then try to submit multiple queries one after another, as the same user, to the HS2 instance. When i check the YARN UI, i find that each query of mine is launched as a new YARN application. While the new Tez application is running, the old Tez applications are still alive. This is different from Tez session in Hive CLI, where multiple queries are submitted to the same Tez application (if launched within the Tez session timeout). I followed the config instructions at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html so far. Is there a separate config flag that i need to turn on for Tez sessions on HS2? How should i enable Tez sessions with HiveServer2. -pala