Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah
If log aggregation is not enabled, the next best thing would be to download the 
application master logs from the RM UI for the apps in question. Those would 
provide a good starting point for figuring out what is going on. 

thanks
— HItesh 


> On Jun 15, 2016, at 8:29 AM, Jose Rozanec  
> wrote:
> 
> Hello, 
> 
> We provide an update. Seems we understood something wrong: hive returned us 
> an error in the query, while Tez job was running not reporting progress. We 
> did not cancel it, since seemed that it hanged. After two hours reported as 
> finished on the UI; while still held running state when listed from YARN for 
> some time more and finished finally finished.
> We have log aggregation enabled, but after the job finished, we still get the 
> same message as reported in the previous email.
> 
> Now will research why Hive detached from Tez while still running; and if we 
> can improve query accept times, since is taking a while to start executing 
> complex queries.
> 
> Thanks, 
> 
> 
> 
> 
> 2016-06-15 12:09 GMT-03:00 Jose Rozanec :
> Hello, 
> 
> I ran the command, and got the following message:
> 16/06/15 15:07:35 INFO impl.TimelineClientImpl: Timeline service address: 
> http://ip-10-64-23-215.ec2.internal:8188/ws/v1/timeline/
> 16/06/15 15:07:35 INFO client.RMProxy: Connecting to ResourceManager at 
> ip-10-64-23-215.ec2.internal/10.64.23.215:8032
> /var/log/hadoop-yarn/apps/hadoop/logs/application_1465996511770_0001 does not 
> exist.
> Log aggregation has not completed or is not enabled.
> 
> I think we are missing some configuration that would help us get more insight?
> 
> Thanks!
> 
> Joze.
> 
> 2016-06-15 12:03 GMT-03:00 Hitesh Shah :
> Hello Joze,
> 
> Would it be possible for you to provide the YARN application logs obtained 
> via “bin/yarn logs -applicationId ” for both of the cases you have 
> seen? Feel free to file JIRAs and attach logs to each of them.
> 
> thanks
> — Hitesh
> 
> > On Jun 15, 2016, at 7:38 AM, Jose Rozanec  
> > wrote:
> >
> > Hello,
> >
> > We are experiencing some issues with Tez 0.8.3 when we issue heavy queries 
> > from Hive. Seems some jobs hang on Tez and never return. Those jobs show up 
> > in the DAG web-ui, but no progress is reported on UI nor on Hive logs. Any 
> > ideas why this could happen? We detect happens with certain memory 
> > configurations, which if missing, the job dies soon (we guess due to OOM).
> >
> > Most probably not related to this, at some point we also got the following 
> > error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has already 
> > shutdown. Application x failed 2 times due to AM Container". We are not 
> > sure can be related to TEZ-2663, which should be solved since version 0.7.1 
> > onwards.
> >
> > Thanks in advance,
> >
> > Joze.
> 
> 
> 



Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Jose Rozanec
Hello,

We provide an update. Seems we understood something wrong: hive returned us
an error in the query, while Tez job was running not reporting progress. We
did not cancel it, since seemed that it hanged. After two hours reported as
finished on the UI; while still held running state when listed from YARN
for some time more and finished finally finished.
We have log aggregation enabled, but after the job finished, we still get
the same message as reported in the previous email.

Now will research why Hive detached from Tez while still running; and if we
can improve query accept times, since is taking a while to start executing
complex queries.

Thanks,




2016-06-15 12:09 GMT-03:00 Jose Rozanec :

> Hello,
>
> I ran the command, and got the following message:
> 16/06/15 15:07:35 INFO impl.TimelineClientImpl: Timeline service address:
> http://ip-10-64-23-215.ec2.internal:8188/ws/v1/timeline/
> 16/06/15 15:07:35 INFO client.RMProxy: Connecting to ResourceManager at
> ip-10-64-23-215.ec2.internal/10.64.23.215:8032
> /var/log/hadoop-yarn/apps/hadoop/logs/application_1465996511770_0001 does
> not exist.
> Log aggregation has not completed or is not enabled.
>
> I think we are missing some configuration that would help us get more
> insight?
>
> Thanks!
>
> Joze.
>
> 2016-06-15 12:03 GMT-03:00 Hitesh Shah :
>
>> Hello Joze,
>>
>> Would it be possible for you to provide the YARN application logs
>> obtained via “bin/yarn logs -applicationId ” for both of the cases
>> you have seen? Feel free to file JIRAs and attach logs to each of them.
>>
>> thanks
>> — Hitesh
>>
>> > On Jun 15, 2016, at 7:38 AM, Jose Rozanec <
>> jose.roza...@mercadolibre.com> wrote:
>> >
>> > Hello,
>> >
>> > We are experiencing some issues with Tez 0.8.3 when we issue heavy
>> queries from Hive. Seems some jobs hang on Tez and never return. Those jobs
>> show up in the DAG web-ui, but no progress is reported on UI nor on Hive
>> logs. Any ideas why this could happen? We detect happens with certain
>> memory configurations, which if missing, the job dies soon (we guess due to
>> OOM).
>> >
>> > Most probably not related to this, at some point we also got the
>> following error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has
>> already shutdown. Application x failed 2 times due to AM Container". We
>> are not sure can be related to TEZ-2663, which should be solved since
>> version 0.7.1 onwards.
>> >
>> > Thanks in advance,
>> >
>> > Joze.
>>
>>
>


Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Jose Rozanec
Hello,

I ran the command, and got the following message:
16/06/15 15:07:35 INFO impl.TimelineClientImpl: Timeline service address:
http://ip-10-64-23-215.ec2.internal:8188/ws/v1/timeline/
16/06/15 15:07:35 INFO client.RMProxy: Connecting to ResourceManager at
ip-10-64-23-215.ec2.internal/10.64.23.215:8032
/var/log/hadoop-yarn/apps/hadoop/logs/application_1465996511770_0001 does
not exist.
Log aggregation has not completed or is not enabled.

I think we are missing some configuration that would help us get more
insight?

Thanks!

Joze.

2016-06-15 12:03 GMT-03:00 Hitesh Shah :

> Hello Joze,
>
> Would it be possible for you to provide the YARN application logs obtained
> via “bin/yarn logs -applicationId ” for both of the cases you have
> seen? Feel free to file JIRAs and attach logs to each of them.
>
> thanks
> — Hitesh
>
> > On Jun 15, 2016, at 7:38 AM, Jose Rozanec 
> wrote:
> >
> > Hello,
> >
> > We are experiencing some issues with Tez 0.8.3 when we issue heavy
> queries from Hive. Seems some jobs hang on Tez and never return. Those jobs
> show up in the DAG web-ui, but no progress is reported on UI nor on Hive
> logs. Any ideas why this could happen? We detect happens with certain
> memory configurations, which if missing, the job dies soon (we guess due to
> OOM).
> >
> > Most probably not related to this, at some point we also got the
> following error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has
> already shutdown. Application x failed 2 times due to AM Container". We
> are not sure can be related to TEZ-2663, which should be solved since
> version 0.7.1 onwards.
> >
> > Thanks in advance,
> >
> > Joze.
>
>


Re: Tez 0.8.3 on EMR hanging with Hive task

2016-06-15 Thread Hitesh Shah
Hello Joze, 

Would it be possible for you to provide the YARN application logs obtained via 
“bin/yarn logs -applicationId ” for both of the cases you have seen? 
Feel free to file JIRAs and attach logs to each of them.

thanks
— Hitesh

> On Jun 15, 2016, at 7:38 AM, Jose Rozanec  
> wrote:
> 
> Hello, 
> 
> We are experiencing some issues with Tez 0.8.3 when we issue heavy queries 
> from Hive. Seems some jobs hang on Tez and never return. Those jobs show up 
> in the DAG web-ui, but no progress is reported on UI nor on Hive logs. Any 
> ideas why this could happen? We detect happens with certain memory 
> configurations, which if missing, the job dies soon (we guess due to OOM).
> 
> Most probably not related to this, at some point we also got the following 
> error: "org.apache.tez.dag.api.SessionNotRunning: TezSession has already 
> shutdown. Application x failed 2 times due to AM Container". We are not 
> sure can be related to TEZ-2663, which should be solved since version 0.7.1 
> onwards.
> 
> Thanks in advance, 
> 
> Joze.