Re: tez "sessions"

2017-03-14 Thread Stephen Sprague
yeah. looks related to the timeline-service alright - i think.

here's the jstack output.


2017-03-14 11:27:46
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode):

"Attach Listener" #536 daemon prio=9 os_prio=0 tid=0x7f46c00da000
nid=0x2be6 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"AMShutdownThread" #517 daemon prio=5 os_prio=0 tid=0x7f46b8059000
nid=0x6cfa runnable [0x7f46a73af000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0xfe71a6e0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:703)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534)
- locked <0xfe70bd68> (a
sun.net.www.protocol.http.HttpURLConnection)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
- locked <0xfe70bd68> (a
sun.net.www.protocol.http.HttpURLConnection)
at
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
at
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at
com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.serviceStop(ATSHistoryLoggingService.java:233)
- locked <0xcd56b968> (a java.lang.Object)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0xcd4042a0> (a java.lang.Object)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.tez.dag.history.HistoryEventHandler.serviceStop(HistoryEventHandler.java:85)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0xcd0cf878> (a java.lang.Object)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
at
org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1938)
at
org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2121)
- locked <0xccf30038> (a
org.apache.tez.dag.app.DAGAppMaster)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0xccf301d0> (a java.lang.Object)
at
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:952)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"ContainerLauncher #31" #210 daemon prio=5 os_prio=0 tid=0x7f46c42c
nid=0x1601 waiting on condition [0x7f46a4f9]
   

Re: tez "sessions"

2017-03-14 Thread Stephen Sprague
Thanks Gopal.   lemme see what i can do with your insights and report back
with my findings.

Cheers,
Stephen.

On Tue, Mar 14, 2017 at 6:19 AM, Gopal Vijayaraghavan 
wrote:

> > Looking at the doc i thought this config setting would influence those
> Tez jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but
> testing proved otherwise. It didn't seem to have any affect.
> > So i ask. How to force off those Tez jobs organically? Or is there
> perhaps something else i'm missing?
>
> A jstack of the Tez AM would be useful.
>
> My guess is that this is related to ATS.
>
> tez.yarn.ats.event.flush.timeout.millis=-1L;
>
> Is the default and if ATS is down for whatever reason, Tez queries will
> wait infinite time to flush all events to ATS.
>
> You can probably set that to 60L and see if the AMs disappear after 10
> minutes.
>
> Before TEZ-1701, this was set to 3 seconds which broke the UI when the ATS
> instance was temporarily unavaible.
>
> Cheers,
> Gopal
>
>
>


Re: tez "sessions"

2017-03-14 Thread Gopal Vijayaraghavan
> Looking at the doc i thought this config setting would influence those Tez 
> jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but testing 
> proved otherwise. It didn't seem to have any affect.
> So i ask. How to force off those Tez jobs organically? Or is there perhaps 
> something else i'm missing?

A jstack of the Tez AM would be useful.

My guess is that this is related to ATS.

tez.yarn.ats.event.flush.timeout.millis=-1L;

Is the default and if ATS is down for whatever reason, Tez queries will wait 
infinite time to flush all events to ATS.

You can probably set that to 60L and see if the AMs disappear after 10 
minutes.

Before TEZ-1701, this was set to 3 seconds which broke the UI when the ATS 
instance was temporarily unavaible.

Cheers,
Gopal




tez "sessions"

2017-03-14 Thread Stephen Sprague
hi guys,
this seems to be a familiar subject but i still don't have a handle on it,
alas. I'm totally misunderstanding something here.

in our case we use tez and submit many different jobs to our queue called
"batch_sql". this is works great until the Tez job finishes (100% complete)
and instead of dropping out of the queue it hangs around for hours it seems
taking up a slot in our queue holding onto one container.

as you can imagine given our queue width is 15 after 15 Tez jobs we're log
jammed - so we wrote a script which looks for Tez jobs 100% complete and
perform yarn kill commands on them. So, yeah, certainly not ideal.

Looking at the doc i thought this config setting would influence those Tez
jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but
testing proved otherwise. It didn't seem to have any affect.

So i ask. How to force off those Tez jobs organically? Or is there perhaps
something else i'm missing?

thanks,
Stephen.


Re: Tez Sessions

2016-10-20 Thread Hitesh Shah
HiveServer2 today maintains Tez sessions ( when running with perimeter security 
i.e Ranger/Sentry ) and re-uses the session across queries. 

Tez AM recovery works for the most part. It will try to recover completed tasks 
of the last running DAG and complete the one that did not complete or were 
still running. It does not handle cases where the committer was in the middle 
of a commit though so those dags will abort when trying to recover. Given the 
complexity of recovery, there are probably bugs that we may not have discovered 
yet but for the most part, it does function well.

There are a few issues you should consider when trying to use a single AM:
   - on secure clusters, the delegation token max lifetime is 7 days so you 
will need to re-cycle apps on a weekly basis. 
   - YARN does not clean up data/logs for an app until the app completes so 
this can add space pressure on the yarn local dirs. That said, there is some 
work happening as part of TEZ-3334 to help clean up intermediate data on a 
regular basis. There have been a couple of other jiras filed recently too to 
look at being able to clean up data more frequently.

thanks
— Hitesh   


> On Oct 20, 2016, at 2:35 PM, Madhusudan Ramanna <m.rama...@ymail.com> wrote:
> 
> Ok, no worries. I agree that this single AM model would be very close to a 
> mini-job tracker.  One of the options we're investigating having 1 yarn Tez 
> AM running all our DAGs. Given this AM already has all the 
> resources/containers, we were thinking this could save on the cost of AM, and 
> container initialization.
> 
> We haven't looked into tez recovery as well.  Durability is one of our big 
> concerns as well.
> 
> 
> On Thursday, October 20, 2016 12:44 PM, Hitesh Shah <hit...@apache.org> wrote:
> 
> 
> Not supported as of now. There are multiple aspects to supporting this 
> properly. One of the most important issues to address would be to do proper 
> QoS across various DAGs i.e. what kind of policies would need to be built out 
> to run multiple DAGs to completion within a limited amount of resources. The 
> model would become close to a mini-jobtracker or a spark-standalone cluster.
> 
> Could you provide more details on what you are trying to achieve? We could 
> try and provide different viewpoints on trying to get you to a viable 
> solution.
> 
> — Hitesh
> 
> > On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> 
> > wrote:
> > 
> > Hello Folks,
> > 
> > http://hortonworks.com/blog/introducing-tez-sessions/
> > 
> > From the above post it seems like DAGs can only be executed serially.  
> > Could DAGs be executed in parallel on one Tez AM ?  
> > 
> > thanks,
> > Madhu
> 
> 



Re: Tez Sessions

2016-10-20 Thread Madhusudan Ramanna
Ok, no worries. I agree that this single AM model would be very close to a 
mini-job tracker.  One of the options we're investigating having 1 yarn Tez AM 
running all our DAGs. Given this AM already has all the resources/containers, 
we were thinking this could save on the cost of AM, and container 
initialization.
We haven't looked into tez recovery as well.  Durability is one of our big 
concerns as well. 

On Thursday, October 20, 2016 12:44 PM, Hitesh Shah <hit...@apache.org> 
wrote:
 

 Not supported as of now. There are multiple aspects to supporting this 
properly. One of the most important issues to address would be to do proper QoS 
across various DAGs i.e. what kind of policies would need to be built out to 
run multiple DAGs to completion within a limited amount of resources. The model 
would become close to a mini-jobtracker or a spark-standalone cluster.

Could you provide more details on what you are trying to achieve? We could try 
and provide different viewpoints on trying to get you to a viable solution.

— Hitesh

> On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> wrote:
> 
> Hello Folks,
> 
> http://hortonworks.com/blog/introducing-tez-sessions/
> 
> From the above post it seems like DAGs can only be executed serially.  Could 
> DAGs be executed in parallel on one Tez AM ?  
> 
> thanks,
> Madhu


   

Re: Tez Sessions

2016-10-20 Thread Hitesh Shah
Not supported as of now. There are multiple aspects to supporting this 
properly. One of the most important issues to address would be to do proper QoS 
across various DAGs i.e. what kind of policies would need to be built out to 
run multiple DAGs to completion within a limited amount of resources. The model 
would become close to a mini-jobtracker or a spark-standalone cluster.

Could you provide more details on what you are trying to achieve? We could try 
and provide different viewpoints on trying to get you to a viable solution.

— Hitesh

> On Oct 20, 2016, at 10:52 AM, Madhusudan Ramanna <m.rama...@ymail.com> wrote:
> 
> Hello Folks,
> 
> http://hortonworks.com/blog/introducing-tez-sessions/
> 
> From the above post it seems like DAGs can only be executed serially.  Could 
> DAGs be executed in parallel on one Tez AM ?  
> 
> thanks,
> Madhu



Tez Sessions

2016-10-20 Thread Madhusudan Ramanna
Hello Folks,
http://hortonworks.com/blog/introducing-tez-sessions/
>From the above post it seems like DAGs can only be executed serially.  Could 
>DAGs be executed in parallel on one Tez AM ?  

thanks,Madhu

Re: Enabling Tez sessions on HiveServer2

2014-12-02 Thread Hitesh Shah
BCC’ed user@tez.

This question belongs to either the hive user list or the Hortonworks user 
forums. 

thanks
— Hitesh

On Dec 2, 2014, at 1:28 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote:

 Hi,
 
 I am trying to get Tez sessions enabled with HS2. I start the HiveServer2 
 instance with the flag -hiveconf hive.execution.engine=tez and then try to 
 submit multiple queries one after another, as the same user, to the HS2 
 instance. 
 
 When i check the YARN UI, i find that each query of mine is launched as a new 
 YARN application. While the new Tez application is running, the old Tez 
 applications are still alive. This is different from Tez session in Hive CLI, 
 where multiple queries are submitted to the same Tez application (if launched 
 within the Tez session timeout).
 
 
 I followed the config instructions at 
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html
  so far.
 
 Is there a separate config flag that i need to turn on for Tez sessions on 
 HS2? How should i enable Tez sessions with HiveServer2.
 
 
 -pala